[ https://issues.apache.org/jira/browse/HBASE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001848#comment-15001848 ]
Heng Chen commented on HBASE-14782: ----------------------------------- Thanks [~vrodionov] for your test code. The reason is that, as you can see in the patch. {code} - // NOT FOUND -> seek next using hint + // NOT FOUND -> it means this row has been passed, so we jump to next row lastFoundIndex = -1; - return ReturnCode.SEEK_NEXT_USING_HINT; + return ReturnCode.NEXT_ROW; {code} FuzzyRowFilter should jump to next row if current row not match. Currently, if not match, fuzzyRowFilter will always return SEEK_NEXT_USING_HINT I am not sure what is the difference between StoreScanner.seekAsDirection and StoreScanner.seekToNextRow, but currently If we go path StoreScanner.seekAsDirection (FuzzyRowFilter return SEEK_NEXT_USING_HINT), StoreScanner.heap.peek() will return null. So heap will be set to null in StoreScanner.close Relates code in StoreScanner.next as below: {code} LOOP: do { ...... ScanQueryMatcher.MatchCode qcode = matcher.match(cell); qcode = optimize(qcode, cell); switch(qcode) { ....... case SEEK_NEXT_ROW: // This is just a relatively simple end of scan fix, to short-cut end // us if there is an endKey in the scan. if (!matcher.moreRowsMayExistAfter(cell)) { return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues(); } seekToNextRow(cell); break; ........ case SEEK_NEXT_USING_HINT: Cell nextKV = matcher.getNextKeyHint(cell); if (nextKV != null) { seekAsDirection(nextKV); } else { heap.next(); } break; default: throw new RuntimeException("UNEXPECTED"); } } while((cell = this.heap.peek()) != null); if (count > 0) { return scannerContext.setScannerState(NextState.MORE_VALUES).hasMoreValues(); } close(false); // heap will set to null which cause the other rows will not be processed. return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues(); {code} > FuzzyRowFilter skips valid rows > ------------------------------- > > Key: HBASE-14782 > URL: https://issues.apache.org/jira/browse/HBASE-14782 > Project: HBase > Issue Type: Bug > Affects Versions: 2.0.0 > Reporter: Vladimir Rodionov > Assignee: Vladimir Rodionov > Attachments: HBASE-14782.patch > > > The issue may affect not only master branch, but previous releases as well. > This is from one of our customers: > {quote} > We are experiencing a problem with the FuzzyRowFilter for HBase scan. We > think that it is a bug. > Fuzzy filter should pick a row if it matches filter criteria irrespective of > other rows present in table but filter is dropping a row depending on some > other row present in table. > Details/Step to reproduce/Sample outputs below: > Missing row key: \x9C\x00\x044\x00\x00\x00\x00 > Causing row key: \x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX > Prerequisites > 1. Create a test table. HBase shell command -- create 'fuzzytest','d' > 2. Insert some test data. HBase shell commands: > • put 'fuzzytest',"\x9C\x00\x044\x00\x00\x00\x00",'d:a','junk' > • put 'fuzzytest',"\x9C\x00\x044\x01\x00\x00\x00",'d:a','junk' > • put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x00",'d:a','junk' > • put 'fuzzytest',"\x9C\x00\x044\x00\x00\x01\x00",'d:a','junk' > • put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x01",'d:a','junk' > • put 'fuzzytest',"\x9B\x00\x044e\xBB\xB2\xBB",'d:a','junk' > • put 'fuzzytest',"\x9D\x00\x044e\xBB\xB2\xBB",'d:a','junk' > Now when you run the code, you will find \x9C\x00\x044\x00\x00\x00\x00 in > output because it matches filter criteria. (Refer how to run code below) > Insert the row key causing bug: > HBase shell command: put > 'fuzzytest',"\x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX",'d:a','junk' > Now when you run the code, you will not find \x9C\x00\x044\x00\x00\x00\x00 in > output even though it still matches filter criteria. > {quote} > Verified the issue on master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)