[jira] [Commented] (HBASE-11728) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING

ramkrishna.s.vasudevan (JIRA) Thu, 14 Aug 2014 01:58:39 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096757#comment-14096757
 ]


ramkrishna.s.vasudevan commented on HBASE-11728:
------------------------------------------------

This is what is happening now
The trie structure formed for the above data is
{code}
B    0  1  a-b-
  L  1  2      0-0
 N   1  2      A-1
B    0  3         -14023
  L  1  4               29600-1402396277
  L  1  4               97227-1402415999
  L  1  2      B-2-1402397300-1402416535
{code}
For the case where we add the column to the scan, the scan tries to find a kv 
that is greater than 
a-b-A-1,cf_1,qf_2.
So the seeker goes thro these row nodes 
{code}
fan:0AB,token:a-b-,numCells:0,fanIndex:-1
fan:-,token:-1,numCells:1,fanIndex:-1
fan:29,token:14023,numCells:0,fanIndex:-1
fan:,token:9600-1402396277,numCells:1,fanIndex:-1
{code}
Finally this points to the required kv a-b-A-1-1402397227-1402415999, cf_1, 
qf_2. 
Since the scanner always tries to fetch the previous kv the seeker moves back 
to the kv just before this following the below path
{code}
fan:29,token:14023,numCells:0,fanIndex:0(2)
fan:-,token:-1,numCells:1,fanIndex:0(-)
{code}

So the actual next() call should advance() and get the kv from where the 
previous() was done.  But in this case the path followed is this
{code}
fan:-,token:-1,numCells:1,fanIndex:0(-)
fan:0AB,token:a-b-,numCells:0,fanIndex:1(A)
{code}

And hence the next ends up in fetching the node that is directly under a-b- and 
that is the kv a-b-B-2-1402397300-1402416535.  
So from the above analysis it shows that while doing previous() the fanIndex 
gets changed for the row node and hence while doing the actual next() the 
traversal of the trie structure ends up in the other leaf node attached to it.


> Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING
> --------------------------------------------------------------
>
>                 Key: HBASE-11728
>                 URL: https://issues.apache.org/jira/browse/HBASE-11728
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.1.1, 0.98.4
>         Environment: ubuntu12 
> hadoop-2.2.0
> Hbase-0.96.1.1
> SUN-JDK(1.7.0_06-b24)
>            Reporter: wuchengzhi
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.99.0, 2.0.0, 0.98.6
>
>         Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HFileAnalys.java, 
> TestPrefixTree.java
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In Scan case, i prepare some data as beflow:
> Table Desc (Using the prefix-tree encoding) :
> 'prefix_tree_test', {NAME => 'cf_1', DATA_BLOCK_ENCODING => 'PREFIX_TREE', 
> TTL => '15552000'}
> and i put 5 rows as:
> (RowKey , Qualifier, Value)
> 'a-b-0-0', 'qf_1', 'c1-value'
> 'a-b-A-1', 'qf_1', 'c1-value'
> 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value'
> 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
> 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3'
> so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the 
> corret result:
> Test 1: 
> Scan scan = new Scan();
> scan.setStartRow("a-b-A-1".getBytes());
> scan.setStopRow("a-b-A-1:".getBytes());
> ------------------------------------------------------
> 'a-b-A-1', 'qf_1', 'c1-value'
> 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value'
> 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
> and then i try next , scan to addColumn
> Test2:
> Scan scan = new Scan();
> scan.addColumn(Bytes.toBytes("cf_1") ,  Bytes.toBytes("qf_2"));
> scan.setStartRow("a-b-A-1".getBytes());
> scan.setStopRow("a-b-A-1:".getBytes());
> ----------------------------------------------
> except:
> 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value'
> 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
> but actually i got nonthing. Then i update the addColumn for 
> scan.addColumn(Bytes.toBytes("cf_1") ,  Bytes.toBytes("qf_1")); and i got the 
> expected result 'a-b-A-1', 'qf_1', 'c1-value' as well.
> then i do more testing...  i update the case to modify the startRow greater 
> than the 'a-b-A-1' 
> Test3:
> Scan scan = new Scan();
> scan.setStartRow("a-b-A-1-".getBytes());
> scan.setStopRow("a-b-A-1:".getBytes());
> ------------------------------------------------------
> except:
> 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value'
> 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
> but actually  i got nothing again. i modify the start row greater than 
> 'a-b-A-1-1402329600-1402396277'
> Scan scan = new Scan();
> scan.setStartRow("a-b-A-1-140239".getBytes());
> scan.setStopRow("a-b-A-1:".getBytes());
> and i got the expect row as well:
> 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
> So, i think it may be a bug in the prefix-tree encoding.It happens after the 
> data flush to the storefile, and it's ok when the data in mem-store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11728) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING

Reply via email to