[
https://issues.apache.org/jira/browse/HBASE-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146080#comment-14146080
]
zhangduo commented on HBASE-12078:
----------------------------------
Thank you,ramkrishna.s.vasudevan. Sorry for my poor english, I will try my best
to explain the fix.
In RowNodeReader.whichFanNode,negative return value of
Bytes.unsignedBinarySearch means the -(insertPosition+1) where insertPosition
is for the whole block, so -fanIndexInBlock - 1 - fanOffset will be the
insertPosition of fan, then plus 1 and negative, the return value of this
function should be -(-fanIndexInBlock - 1 - fanOffset + 1) = fanIndexInBlock +
fanOffset.
return fanIndexInBlock + fanOffset + 1 is wrong because the following two
situation both return 0:
1. find a fan at position 0
2. do not find a fan and the insert position is 0
and this will result to a wrong followFan operation.
In PrefixTreeArraySearcher.compareToCurrentToken, the problem is that, some
node may have token length 0 if it is a single byte node(I think the byte is
stored at its parent's fan?), so sometimes we do not have a change to run the i
>= key.getRowLength() check even if we reach the end of the row key part of the
key we want to search.
My testcase is used to reproduce these two problems, but after patching for the
two problems, we found some testcase of prefix-tree went wrong, and we found
another bug.
In PrefixTreeArraySearcher.fixRowFanMissReverse, original implementation will
always find previous row when insertPosition is 0. But if currentRowNode
represent a row key(hasOccurrences), then the current row is the first row that
less than the row we want to search, not the row before current row.
> Missing Data when scanning using PREFIX_TREE DATA-BLOCK-ENCODING
> ----------------------------------------------------------------
>
> Key: HBASE-12078
> URL: https://issues.apache.org/jira/browse/HBASE-12078
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.98.6.1
> Environment: CentOS 6.3
> hadoop 2.5.0(hdfs)
> hadoop 2.2.0(hbase)
> hbase 0.98.6.1
> sun-jdk 1.7.0_67-b01
> Reporter: zhangduo
> Attachments: prefix_tree_error.patch
>
>
> our row key is combined with two ints, and we found that sometimes when we
> using only the first int part to scan, the result returned may missing some
> rows. But when we dump the whole hfile, the row is still there.
> We have written a testcase to reproduce the bug. It works like this:
> put 1-12345
> put 12345-0x01000000
> put 12345-0x01010000
> put 12345-0x02000000
> put 12345-0x02020000
> put 12345-0x03000000
> put 12345-0x03030000
> put 12345-0x04000000
> put 12345-0x04040000
> flush memstore
> then scan using 12345,the returned row key will be
> 12345-0x20000000(12345-0x10000000 expected)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)