[ https://issues.apache.org/jira/browse/ACCUMULO-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901933#comment-14901933 ]
ASF GitHub Bot commented on ACCUMULO-2232: ------------------------------------------ Github user joshelser commented on a diff in the pull request: https://github.com/apache/accumulo/pull/47#discussion_r40052742 --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java --- @@ -149,6 +159,37 @@ public void next() throws IOException { private Key workKey = new Key(); + private static Cache<String,Long> loggedMsgCache = CacheBuilder.newBuilder().expireAfterWrite(1, TimeUnit.HOURS).build(); + + private void sawDelete() { + if (isPartialCompaction) { + switch (deleteHandlingAction) { + case LOG_ERROR: + try { + loggedMsgCache.get(this.getClass().getName(), new Callable<Long>() { + @Override + public Long call() throws Exception { + log.error("Combiner of type " + this.getClass().getSimpleName() + + " saw a delete during a partial compaction. This could cause undesired results. See ACCUMULO-2232. Will not log subsequent occurences for at least 1 hour."); + return System.currentTimeMillis(); --- End diff -- Do we really need the current time? Isn't `0` sufficient since we don't inspect the value of the cache entries? A comment that the value returned by the Callable is meaningless would also be nice. > Combiners can cause deleted data to come back > --------------------------------------------- > > Key: ACCUMULO-2232 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2232 > Project: Accumulo > Issue Type: Bug > Components: client, tserver > Reporter: John Vines > > The case- > 3 files with- > * 1 with a key, k, with timestamp 0, value 3 > * 1 with a delete of k with timestamp 1 > * 1 with k with timestamp 2, value 2 > The column of k has a summing combiner set on it. The issue here is that > depending on how the major compactions play out, differing values with > result. If all 3 files compact, the correct value of 2 will result. However, > if 1 & 3 compact first, they will aggregate to 5. And then the delete will > fall after the combined value, resulting in the result 5 to persist. > First and foremost, this should be documented. I think to remedy this, > combiners should only be used on full MajC, not not full ones. This may > necessitate a special flag or a new combiner that implemented the proper > semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)