[ 
https://issues.apache.org/jira/browse/ACCUMULO-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879162#comment-13879162
 ] 

Josh Elser commented on ACCUMULO-2232:
--------------------------------------

I'm a little worried about implications (sorry for using that phrase) that only 
running combiners on full MajC would have on performance since, for heavy 
combination, you're going to be persisting and later re-reading many records 
instead of just once for a potentially very long time (if you assume that full 
MajCs are few and far between).

I can't come up with another easy way to fix it though for the SummingCombiner 
example, so accuracy is still better than being slow. Anything else I can think 
of would involve persisting deletes across non-full compactions which would 
require quite a bit more work to get correct, I imagine.

> Combiners can cause deleted data to come back
> ---------------------------------------------
>
>                 Key: ACCUMULO-2232
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2232
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>            Reporter: John Vines
>
> The case-
> 3 files with-
> * 1 with a key, k, with timestamp 0, value 3
> * 1 with a delete of k with timestamp 1
> * 1 with k with timestamp 2, value 2
> The column of k has a summing combiner set on it. The issue here is that 
> depending on how the major compactions play out, differing values with 
> result. If all 3 files compact, the correct value of 2 will result. However, 
> if 1 & 3 compact first, they will aggregate to 5. And then the delete will 
> fall after the combined value, resulting in the result 5 to persist.
> First and foremost, this should be documented. I think to remedy this, 
> combiners should only be used on full MajC, not not full ones. This may 
> necessitate a special flag or a new combiner that implemented the proper 
> semantics.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to