[
https://issues.apache.org/jira/browse/ACCUMULO-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880146#comment-13880146
]
Keith Turner commented on ACCUMULO-2232:
----------------------------------------
Another possible way to handle efficiency concerns is to have Accumulo initiate
a major compaction if scans are repeatedly combining a lot of data. This
serves as a use case for ACCUMULO-1266.
> Combiners can cause deleted data to come back
> ---------------------------------------------
>
> Key: ACCUMULO-2232
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2232
> Project: Accumulo
> Issue Type: Bug
> Components: client, tserver
> Reporter: John Vines
>
> The case-
> 3 files with-
> * 1 with a key, k, with timestamp 0, value 3
> * 1 with a delete of k with timestamp 1
> * 1 with k with timestamp 2, value 2
> The column of k has a summing combiner set on it. The issue here is that
> depending on how the major compactions play out, differing values with
> result. If all 3 files compact, the correct value of 2 will result. However,
> if 1 & 3 compact first, they will aggregate to 5. And then the delete will
> fall after the combined value, resulting in the result 5 to persist.
> First and foremost, this should be documented. I think to remedy this,
> combiners should only be used on full MajC, not not full ones. This may
> necessitate a special flag or a new combiner that implemented the proper
> semantics.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)