Thanks Dylan and late chimer Josh, who is always helpful.. After Dylan's reply, I did a quick experiment: 1. Set SummingCombiner -all (scan, minor and major compaction) on the table 2. Delete default vers iter from the table (the reason is I just want to see if the rows got 'combined' or not) 3. Insert row id = 'foo' and value = 1 4. Insert row id = 'foo' and value = 1 5. Scan will return 1 row: 'foo', 2 (so this is correct as expected) 6. Delete the summing combiner, so the table doesn't have any iterators now 7. Scan the table again, and now it returns 2 rows (both are 'foo', 1)
Then I deleted the table, and redo all steps above, except replace step #5 with "flush -w". At step #7, it now returns 1 row: 'foo', 2 (this is what I want, which means the combiner result got persisted, instead of being calculated everytime). Therefore, the approach I was thinking about writing the snapshot to another table (because I wanted to avoid aggregation operation every scan) is no longer needed, since Accumulo has taken care of this. After compaction, it'll have 1 row for each unique key with aggregate value. Cool! Thanks for the tips Josh. We are using BatchWriter, so it should perform better throughput. But I just looked at the code, and it looks like we call batchWriter.flush() after each addMutation call. This doesn't seem a good utilization of batch writer... I am curious on how normally people batch the insert/update? The process may crash, and we'll lose those changes unfortunately :-( Thanks, Z Thanks, Z -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/using-combiner-vs-building-stats-cache-tp14979p14998.html Sent from the Developers mailing list archive at Nabble.com.
