[ 
https://issues.apache.org/jira/browse/ACCUMULO-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved ACCUMULO-3646.
----------------------------------
    Resolution: Fixed

> Duplicate entries when iterator emits entries past seek() range
> ---------------------------------------------------------------
>
>                 Key: ACCUMULO-3646
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3646
>             Project: Accumulo
>          Issue Type: Bug
>          Components: docs
>    Affects Versions: 1.6.1
>         Environment: Ubuntu 14.04, Accumulo 1.6.1, Hadoop 2.6.0, Zookeeper 
> 3.4.6
>            Reporter: Dylan Hutchison
>            Assignee: Dylan Hutchison
>            Priority: Minor
>             Fix For: 1.7.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The SortedKeyValueIterator's seek() method documents that an iterator may 
> return keys past the range passed to seek().  However, an iterator set at 
> scan-time that returns values past the range passed to seek() will return 
> those keys multiple times if the client uses a BatchScanner.  This does not 
> occur when the client uses a Scanner. This has nothing to do with the 
> VersioningIterator. This has nothing to do with the entries actually in the 
> table. Also affects MiniAccumulo.
> If this is intended, we should update the SortedKeyValueIterator seek() 
> documentation with a warning that returning keys past the seek() range may 
> result in a client seeing duplicate keys. If this is not intended, then it is 
> a bug.
> Test code: See 
> [InjectTest|https://github.com/Accla/d4m_api_java/blob/master/src/test/java/edu/mit/ll/graphulo/InjectTest.java]
> * method {{testInjectOnScan_Empty}} fails because it uses a BatchScanner
> * method {{testInjectOnScan_Empty_Reg}} passes because it uses a Scanner
> In these methods, the 
> [InjectIterator|https://github.com/Accla/d4m_api_java/blob/master/src/main/java/edu/mit/ll/graphulo/InjectIterator.java]
>  emits entries that go beyond the seek() range.  We confirm what is going on 
> by placing a 
> [DebugIterator|https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/DebugIterator.html]
>  right after.
> Logs when using the BatchScanner:
> notice that the "m1" row is returned twice:
> {noformat}
> 015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class 
> edu.mit.ll.graphulo.InjectIterator: init on scope scan
> 2015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class 
> edu.mit.ll.graphulo.InjectIterator: init on scope scan
> 2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG: 
> init(edu.mit.ll.graphulo.InjectIterator@e9fe846, {}, 
> org.apache.accumulo.tserver.TabletIteratorEnvironment@b99fd03)
> 2015-03-05 06:05:34,771 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
> seek((-inf,f%00; : [] 9223372036854775807 false), [], false)
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
> getTopKey() --> a1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
> getTopKey() --> a1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
> getTopValue() --> 1
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
> getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
> getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
> getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
> getTopValue() --> 1
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
> getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
> getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
> --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
> getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
> getTopValue() --> 1
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
> --> false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
> --> false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
> --> false
> 2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG: 
> init(edu.mit.ll.graphulo.InjectIterator@2528a1f1, {}, 
> org.apache.accumulo.tserver.TabletIteratorEnvironment@244a532a)
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA 
> seek([f%00; : [] 9223372036854775807 false,+inf), [], false)
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() 
> --> true
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA 
> getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() 
> --> true
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA 
> getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() 
> --> true
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA 
> getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA 
> getTopValue() --> 1
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA next()
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() 
> --> false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() 
> --> false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() 
> --> false
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to