[
https://issues.apache.org/jira/browse/HBASE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001848#comment-15001848
]
Heng Chen commented on HBASE-14782:
-----------------------------------
Thanks [~vrodionov] for your test code.
The reason is that, as you can see in the patch.
{code}
- // NOT FOUND -> seek next using hint
+ // NOT FOUND -> it means this row has been passed, so we jump to next row
lastFoundIndex = -1;
- return ReturnCode.SEEK_NEXT_USING_HINT;
+ return ReturnCode.NEXT_ROW;
{code}
FuzzyRowFilter should jump to next row if current row not match.
Currently, if not match, fuzzyRowFilter will always return SEEK_NEXT_USING_HINT
I am not sure what is the difference between StoreScanner.seekAsDirection and
StoreScanner.seekToNextRow, but currently
If we go path StoreScanner.seekAsDirection (FuzzyRowFilter return
SEEK_NEXT_USING_HINT), StoreScanner.heap.peek() will return null.
So heap will be set to null in StoreScanner.close
Relates code in StoreScanner.next as below:
{code}
LOOP: do {
......
ScanQueryMatcher.MatchCode qcode = matcher.match(cell);
qcode = optimize(qcode, cell);
switch(qcode) {
.......
case SEEK_NEXT_ROW:
// This is just a relatively simple end of scan fix, to short-cut end
// us if there is an endKey in the scan.
if (!matcher.moreRowsMayExistAfter(cell)) {
return
scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
}
seekToNextRow(cell);
break;
........
case SEEK_NEXT_USING_HINT:
Cell nextKV = matcher.getNextKeyHint(cell);
if (nextKV != null) {
seekAsDirection(nextKV);
} else {
heap.next();
}
break;
default:
throw new RuntimeException("UNEXPECTED");
}
} while((cell = this.heap.peek()) != null);
if (count > 0) {
return
scannerContext.setScannerState(NextState.MORE_VALUES).hasMoreValues();
}
close(false); // heap will set to null which cause the other rows will
not be processed.
return
scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
{code}
> FuzzyRowFilter skips valid rows
> -------------------------------
>
> Key: HBASE-14782
> URL: https://issues.apache.org/jira/browse/HBASE-14782
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
> Attachments: HBASE-14782.patch
>
>
> The issue may affect not only master branch, but previous releases as well.
> This is from one of our customers:
> {quote}
> We are experiencing a problem with the FuzzyRowFilter for HBase scan. We
> think that it is a bug.
> Fuzzy filter should pick a row if it matches filter criteria irrespective of
> other rows present in table but filter is dropping a row depending on some
> other row present in table.
> Details/Step to reproduce/Sample outputs below:
> Missing row key: \x9C\x00\x044\x00\x00\x00\x00
> Causing row key: \x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX
> Prerequisites
> 1. Create a test table. HBase shell command -- create 'fuzzytest','d'
> 2. Insert some test data. HBase shell commands:
> • put 'fuzzytest',"\x9C\x00\x044\x00\x00\x00\x00",'d:a','junk'
> • put 'fuzzytest',"\x9C\x00\x044\x01\x00\x00\x00",'d:a','junk'
> • put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x00",'d:a','junk'
> • put 'fuzzytest',"\x9C\x00\x044\x00\x00\x01\x00",'d:a','junk'
> • put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x01",'d:a','junk'
> • put 'fuzzytest',"\x9B\x00\x044e\xBB\xB2\xBB",'d:a','junk'
> • put 'fuzzytest',"\x9D\x00\x044e\xBB\xB2\xBB",'d:a','junk'
> Now when you run the code, you will find \x9C\x00\x044\x00\x00\x00\x00 in
> output because it matches filter criteria. (Refer how to run code below)
> Insert the row key causing bug:
> HBase shell command: put
> 'fuzzytest',"\x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX",'d:a','junk'
> Now when you run the code, you will not find \x9C\x00\x044\x00\x00\x00\x00 in
> output even though it still matches filter criteria.
> {quote}
> Verified the issue on master.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)