Hi, Apologize if this question has been asked before (which I am kind of certain). I am building a triple store, and need to build the stats table which will be used for query optimization (i.e. re-order the query triple pattern). There may be more than 2 solutions for this, but the two I know are: 1. Manually rebuild the whole stats, this can be run once per day for example This option would be expensive because we are re-calculating all rows in master table, but the end result is no more computation when we retrieve the stat info. For example, we'll just query stats table for word 'foo', and it'll return a single row with total items for that word.
2. Use Accumulo combiner With this option, we could simply add the counter to the stats table (i.e. insert ['foo', 1]) whenever we insert 'foo' to master table. When we want to get the stat info during query time, Accumulo will actually aggregate all the count for that word 'foo' in map-reduce fashion. For #2, we pay the cost during scan time, but if the rows that have word 'foo' only in hundredth, I guess it won't be so bad, because that aggregation will be done on the server side (and it'd be optimized due to Accumulo design) I prefer option #2, but not sure how expensive is that on Accumulo, especially we'll do a big number of queries per day, than that stats re-calculating process which is once per day. Any comments on this? Please let me know if my problem statement or the question is unclear. Thanks, Z -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/using-combiner-vs-building-stats-cache-tp14979.html Sent from the Developers mailing list archive at Nabble.com.
