[ 
https://issues.apache.org/jira/browse/ACCUMULO-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901933#comment-14901933
 ] 

ASF GitHub Bot commented on ACCUMULO-2232:
------------------------------------------

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40052742
  
    --- Diff: 
core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -149,6 +159,37 @@ public void next() throws IOException {
     
       private Key workKey = new Key();
     
    +  private static Cache<String,Long> loggedMsgCache = 
CacheBuilder.newBuilder().expireAfterWrite(1, TimeUnit.HOURS).build();
    +
    +  private void sawDelete() {
    +    if (isPartialCompaction) {
    +      switch (deleteHandlingAction) {
    +        case LOG_ERROR:
    +          try {
    +            loggedMsgCache.get(this.getClass().getName(), new 
Callable<Long>() {
    +              @Override
    +              public Long call() throws Exception {
    +                log.error("Combiner of type " + 
this.getClass().getSimpleName()
    +                    + " saw a delete during a partial compaction.  This 
could cause undesired results.  See ACCUMULO-2232.  Will not log subsequent 
occurences for at least 1 hour.");
    +                return System.currentTimeMillis();
    --- End diff --
    
    Do we really need the current time? Isn't `0` sufficient since we don't 
inspect the value of the cache entries? A comment that the value returned by 
the Callable is meaningless would also be nice.


> Combiners can cause deleted data to come back
> ---------------------------------------------
>
>                 Key: ACCUMULO-2232
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2232
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>            Reporter: John Vines
>
> The case-
> 3 files with-
> * 1 with a key, k, with timestamp 0, value 3
> * 1 with a delete of k with timestamp 1
> * 1 with k with timestamp 2, value 2
> The column of k has a summing combiner set on it. The issue here is that 
> depending on how the major compactions play out, differing values with 
> result. If all 3 files compact, the correct value of 2 will result. However, 
> if 1 & 3 compact first, they will aggregate to 5. And then the delete will 
> fall after the combined value, resulting in the result 5 to persist.
> First and foremost, this should be documented. I think to remedy this, 
> combiners should only be used on full MajC, not not full ones. This may 
> necessitate a special flag or a new combiner that implemented the proper 
> semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to