Revisit this topic, if I go with option #2, i.e. having a batch job to fix the stats table, now I am not really sure if it will work, since the stats table already have summing combiner enabled, hence the batch job can't just update the value since it'll be incorrect. For example:
Current stats table contains: foo | 2 bar | 3 test | 1 The batch job scan the main table, and going to update the stats table, let say the actual stats is foo=1, bar=4, test=1, hence the final stats table would become: foo | 3 bar | 7 test | 2 It'd be correct if it removes the summing combiner from the table, but then another process (not the batch job) may update particular key, overwriting the correct value (updated from batch job). We can't tolerate the system is offline, otherwise we can refresh the stats during that downtime. Any idea on how to solve this problem? Unfortunately there is an inherent problem with summing combiner, i.e. when adding same key to main table, it'll behave just like 'update' when the same key already exist, but my current logic will add <key>|1 to the stats table, so if we have many 'update', then some values in stats table will be far off. Similar case for deleting, it will be no-op for main table if the key doesn't exist, but the app logic will add <key>|-1 to the stats table. This is the reason why we're thinking to have a batch job to 'fix' the stats table, but that also has its own problem :-( Thanks, Z -- View this message in context: http://apache-accumulo.1065345.n5.nabble.com/another-question-on-summing-combiner-tp15238p15351.html Sent from the Developers mailing list archive at Nabble.com.
