Hi, Last time I posted about using summing combiners to build the stats table. For example, when adding item to the main table, it'd insert that item with value 1 to the stats table that has the summing combiner attached. Same for deleting from main table, it'd insert with value -1. This works fine, except for cases like: 1. insert item already exists (i.e. same key) in the main table 2. delete item that doesn't exist in the main table
Either case above will unfortunately cause data in stats table become incorrect. Though the stats data doesn't need to be precise (it's more for the optimizer in our app to get rough idea of total items), but if either or both cases happen a lot, then it may screw the optimizer. I can think of 2 options to take care this problem: 1. Check the existing data before insert/delete, this will incur performance which will defeat one of summing combiner benefits, which is no need to check existing data, and let combiner does its job 2. Have a job to 'fix' the stats by recalculating everything (i.e. read from main table and rebuild the stats table). This is expensive, but it can be run once a day, so may not be a terrible idea Let me know if any of you have better solution than these. Thanks, Z -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/another-question-on-summing-combiner-tp15238.html Sent from the Developers mailing list archive at Nabble.com.
