Dylan Hutchison created ACCUMULO-3646:
-----------------------------------------

             Summary: Duplicate entries when iterator emits entries past seek() 
range
                 Key: ACCUMULO-3646
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3646
             Project: Accumulo
          Issue Type: Bug
          Components: client, mini, tserver
    Affects Versions: 1.6.1
         Environment: Ubuntu 14.04, Accumulo 1.6.1, Hadoop 2.6.0, Zookeeper 
3.4.6
            Reporter: Dylan Hutchison
            Priority: Minor



The SortedKeyValueIterator's seek() method documents that an iterator may 
return keys past the range passed to seek().  However, an iterator set at 
scan-time that returns values past the range passed to seek() will return those 
keys multiple times if the client uses a BatchScanner.  This does not occur 
when the client uses a Scanner. This has nothing to do with the 
VersioningIterator. This has nothing to do with the entries actually in the 
table. Also affects MiniAccumulo.

If this is intended, we should update the SortedKeyValueIterator seek() 
documentation with a warning that returning keys past the seek() range may 
result in a client seeing duplicate keys. If this is not intended, then it is a 
bug.

Test code: See 
[InjectTest|https://github.com/Accla/d4m_api_java/blob/master/src/test/java/edu/mit/ll/graphulo/InjectTest.java]
* method {{testInjectOnScan_Empty}} fails because it uses a BatchScanner
* method {{testInjectOnScan_Empty_Reg}} passes because it uses a Scanner

In these methods, the 
[InjectIterator|https://github.com/Accla/d4m_api_java/blob/master/src/main/java/edu/mit/ll/graphulo/InjectIterator.java]
 emits entries that go beyond the seek() range.  We confirm what is going on by 
placing a 
[DebugIterator|https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/DebugIterator.html]
 right after.

Logs when using the BatchScanner:
notice that the "m1" row is returned twice:

{noformat}
015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class 
edu.mit.ll.graphulo.InjectIterator: init on scope scan
2015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class 
edu.mit.ll.graphulo.InjectIterator: init on scope scan
2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG: 
init(edu.mit.ll.graphulo.InjectIterator@e9fe846, {}, 
org.apache.accumulo.tserver.TabletIteratorEnvironment@b99fd03)
2015-03-05 06:05:34,771 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
seek((-inf,f%00; : [] 9223372036854775807 false), [], false)
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
--> true
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() 
--> a1 colF3:colQ3 [] 1425553534769 false
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
--> true
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() 
--> a1 colF3:colQ3 [] 1425553534769 false
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
getTopValue() --> 1
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
--> true
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() 
--> c1 colF3:colQ3 [] 1425553534769 false
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
--> true
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() 
--> c1 colF3:colQ3 [] 1425553534769 false
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
--> true
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() 
--> c1 colF3:colQ3 [] 1425553534769 false
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
getTopValue() --> 1
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
--> true
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() 
--> m1 colF3:colQ3 [] 1425553534769 false
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
--> true
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() 
--> m1 colF3:colQ3 [] 1425553534769 false
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
--> true
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() 
--> m1 colF3:colQ3 [] 1425553534769 false
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F 
getTopValue() --> 1
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
--> false
2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
--> false
2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() 
--> false
2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG: 
init(edu.mit.ll.graphulo.InjectIterator@2528a1f1, {}, 
org.apache.accumulo.tserver.TabletIteratorEnvironment@244a532a)
2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA seek([f%00; 
: [] 9223372036854775807 false,+inf), [], false)
2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() 
--> true
2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() 
--> m1 colF3:colQ3 [] 1425553534769 false
2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() 
--> true
2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() 
--> m1 colF3:colQ3 [] 1425553534769 false
2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() 
--> true
2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() 
--> m1 colF3:colQ3 [] 1425553534769 false
2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA 
getTopValue() --> 1
2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA next()
2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() 
--> false
2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() 
--> false
2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() 
--> false
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to