Re: another question on summing combiner

z11373 Fri, 23 Oct 2015 13:28:56 -0700

Hi Dylan,
Right now we don't perform check (read) before performing an update. Below
is a simple scenario.


Main table is initially empty, then client sends request which translates to
inserting the data, i.e.
Main table:
A
B
C
D

Stats table:
A 1
B 1
C 1
D 1

Let say its next request is to delete C.
Main table:
A
B
D

Stats table:
A 1
B 1
C 0 (1 + -1)
D 1

Next request is to update B and D (the request got translated to delete B
and D, and insert B and D), but let say it somehow failed in between the
delete and insert operations, so the tables would look like:
Main table:
A

Stats table:
A 1
B 0
C 0
D 0

Client is fault-tolerant, and retry the entire request, so now the tables
would look like:
Main table:
A
B
D

Stats table:
A 1
B 0 (-1 + 1)
C 0
D 0 (-1 + 1)


As you see above, the end state for Main table is correct, because the retry
will do the 'update', but unfortunately not for the Stats table.
The idea I mentioned last time was to have a batch job that scans the whole
Main table to get the 'truth' data, and update Stats table accordingly, but
in order to update 'accordingly', it first has to read the current value in
Stats table (due to combiner), which affects performance.


Thanks,
Z





--
View this message in context: 
http://apache-accumulo.1065345.n5.nabble.com/another-question-on-summing-combiner-tp15238p15412.html
Sent from the Developers mailing list archive at Nabble.com.

Re: another question on summing combiner

Reply via email to