[jira] [Commented] (HBASE-12078) Missing Data when scanning using PREFIX_TREE DATA-BLOCK-ENCODING

zhangduo (JIRA) Wed, 24 Sep 2014 01:51:43 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146080#comment-14146080
 ]


zhangduo commented on HBASE-12078:
----------------------------------

Thank you，ramkrishna.s.vasudevan. Sorry for my poor english, I will try my best 
to explain the fix.

In RowNodeReader.whichFanNode，negative return value of 
Bytes.unsignedBinarySearch means the -(insertPosition+1) where insertPosition 
is for the whole block, so -fanIndexInBlock - 1 - fanOffset will be the 
insertPosition of fan, then plus 1 and negative, the return value of this 
function should be -(-fanIndexInBlock - 1 - fanOffset + 1) = fanIndexInBlock + 
fanOffset.

return fanIndexInBlock + fanOffset + 1 is wrong because the following two 
situation both return 0:
 1. find a fan at position 0
 2. do not find a fan and the insert position is 0
and this will result to a wrong followFan operation.

In PrefixTreeArraySearcher.compareToCurrentToken, the problem is that, some 
node may have token length 0 if it is a single byte node(I think the byte is 
stored at its parent's fan?), so sometimes we do not have a change to run the i 
>= key.getRowLength() check even if we reach the end of the row key part of the 
key we want to search.

My testcase is used to reproduce these two problems, but after patching for the 
two problems, we found some testcase of prefix-tree went wrong, and we found 
another bug.

In PrefixTreeArraySearcher.fixRowFanMissReverse, original implementation will 
always find previous row when insertPosition is 0. But if currentRowNode 
represent a row key(hasOccurrences), then the current row is the first row that 
less than the row we want to search, not the row before current row.

> Missing Data when scanning using PREFIX_TREE DATA-BLOCK-ENCODING
> ----------------------------------------------------------------
>
>                 Key: HBASE-12078
>                 URL: https://issues.apache.org/jira/browse/HBASE-12078
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.6.1
>         Environment: CentOS 6.3
> hadoop 2.5.0(hdfs)
> hadoop 2.2.0(hbase)
> hbase 0.98.6.1
> sun-jdk 1.7.0_67-b01
>            Reporter: zhangduo
>         Attachments: prefix_tree_error.patch
>
>
> our row key is combined with two ints, and we found that sometimes when we 
> using only the first int part to scan, the result returned may missing some 
> rows. But when we dump the whole hfile, the row is still there.
> We have written a testcase to reproduce the bug. It works like this:
> put 1-12345
> put 12345-0x01000000
> put 12345-0x01010000
> put 12345-0x02000000
> put 12345-0x02020000
> put 12345-0x03000000
> put 12345-0x03030000
> put 12345-0x04000000
> put 12345-0x04040000
> flush memstore
> then scan using 12345，the returned row key will be 
> 12345-0x20000000(12345-0x10000000 expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12078) Missing Data when scanning using PREFIX_TREE DATA-BLOCK-ENCODING

Reply via email to