[
https://issues.apache.org/jira/browse/ACCUMULO-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Elser resolved ACCUMULO-3646.
----------------------------------
Resolution: Fixed
> Duplicate entries when iterator emits entries past seek() range
> ---------------------------------------------------------------
>
> Key: ACCUMULO-3646
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3646
> Project: Accumulo
> Issue Type: Bug
> Components: docs
> Affects Versions: 1.6.1
> Environment: Ubuntu 14.04, Accumulo 1.6.1, Hadoop 2.6.0, Zookeeper
> 3.4.6
> Reporter: Dylan Hutchison
> Assignee: Dylan Hutchison
> Priority: Minor
> Fix For: 1.7.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> The SortedKeyValueIterator's seek() method documents that an iterator may
> return keys past the range passed to seek(). However, an iterator set at
> scan-time that returns values past the range passed to seek() will return
> those keys multiple times if the client uses a BatchScanner. This does not
> occur when the client uses a Scanner. This has nothing to do with the
> VersioningIterator. This has nothing to do with the entries actually in the
> table. Also affects MiniAccumulo.
> If this is intended, we should update the SortedKeyValueIterator seek()
> documentation with a warning that returning keys past the seek() range may
> result in a client seeing duplicate keys. If this is not intended, then it is
> a bug.
> Test code: See
> [InjectTest|https://github.com/Accla/d4m_api_java/blob/master/src/test/java/edu/mit/ll/graphulo/InjectTest.java]
> * method {{testInjectOnScan_Empty}} fails because it uses a BatchScanner
> * method {{testInjectOnScan_Empty_Reg}} passes because it uses a Scanner
> In these methods, the
> [InjectIterator|https://github.com/Accla/d4m_api_java/blob/master/src/main/java/edu/mit/ll/graphulo/InjectIterator.java]
> emits entries that go beyond the seek() range. We confirm what is going on
> by placing a
> [DebugIterator|https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/DebugIterator.html]
> right after.
> Logs when using the BatchScanner:
> notice that the "m1" row is returned twice:
> {noformat}
> 015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class
> edu.mit.ll.graphulo.InjectIterator: init on scope scan
> 2015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class
> edu.mit.ll.graphulo.InjectIterator: init on scope scan
> 2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG:
> init(edu.mit.ll.graphulo.InjectIterator@e9fe846, {},
> org.apache.accumulo.tserver.TabletIteratorEnvironment@b99fd03)
> 2015-03-05 06:05:34,771 [iterators.DebugIterator] DEBUG: 0x516E9F1F
> seek((-inf,f%00; : [] 9223372036854775807 false), [], false)
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop()
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F
> getTopKey() --> a1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop()
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F
> getTopKey() --> a1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F
> getTopValue() --> 1
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop()
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F
> getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop()
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F
> getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop()
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F
> getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F
> getTopValue() --> 1
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop()
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F
> getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop()
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F
> getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop()
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F
> getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F
> getTopValue() --> 1
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop()
> --> false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop()
> --> false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop()
> --> false
> 2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG:
> init(edu.mit.ll.graphulo.InjectIterator@2528a1f1, {},
> org.apache.accumulo.tserver.TabletIteratorEnvironment@244a532a)
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA
> seek([f%00; : [] 9223372036854775807 false,+inf), [], false)
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop()
> --> true
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA
> getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop()
> --> true
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA
> getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop()
> --> true
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA
> getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA
> getTopValue() --> 1
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA next()
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop()
> --> false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop()
> --> false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop()
> --> false
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)