[jira] [Commented] (HBASE-11728) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING

wuchengzhi (JIRA) Thu, 14 Aug 2014 01:39:01 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14096741#comment-14096741
 ]


wuchengzhi commented on HBASE-11728:
------------------------------------

I am doing more tests,i download the HFile to my localdisk, and using 
HFileReaderV2 & HFileScan for debugging.... 
So I get some information,maybe it's unuseful ,and i want to say..

byte[] startRow = KeyValue.createFirstDeleteFamilyOnRow("a-b-A-1-".getBytes(), 
Bytes.toBytes("cf_1")).getKey();
HFileScan.seekTo(startRow) ;  
//a-b-A-1/cf_1:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/mvcc=0
after done, ptSearcher.current() of this seeker is 
a-b-0-0/cf_1:qf_1/1407908210357/Put/vlen=8/mvcc=0

and i call HFileScan.next() to get the  prefix of "a-b-A-1"
then i got output as below :
a-b-A-1/cf_1:qf_1/1407908210411/Put/vlen=8/mvcc=0
a-b-A-1-1402329600-1402396277/cf_1:qf_2/1407908210445/Put/vlen=8/mvcc=0
a-b-A-1-1402397227-1402415999/cf_1:qf_2/1407908210494/Put/vlen=10/mvcc=0

Also,i got some debug info for PrefixTreeArrayScanner. before:advance();
seeker.current() is a-b-A-1/cf_1:qf_1/1407908210411/Put/vlen=8/mvcc=0

currentRowNode: fan:-,token:-1,numCells:1,fanIndex:-1
rowNodes:
[0]fan:0AB,token:a-b-,numCells:0,fanIndex:1(A)
[1]fan:-,token:-1,numCells:1,fanIndex:-1

and i try to modify the startRow to 
"a-b-A-1-/cf_1:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/mvcc=0" and test again.

after done, ptSearcher.current() of this seeker is  
a-b-A-1/cf_1:qf_1/1407908210411/Put/vlen=8/mvcc=0
but i call HfileScan.next(), i got nothing..


and  some debug info for PrefixTreeArrayScanner. before:advance();

seeker.current() is a-b-A-1/cf_1:qf_1/1407908210411/Put/vlen=8/mvcc=0

fan:-,token:-1,numCells:1,fanIndex:0(-)
rowNodes:
[0]fan:0AB,token:a-b-,numCells:0,fanIndex:1(A)
[1]fan:-,token:-1,numCells:1,fanIndex:0(-)
[2]fan:29,token:14023,numCells:0,fanIndex:0(2)
[3]fan:,token:9600-1402396277,numCells:1,fanIndex:-1

and then after next(),the seeker.current() is 
a-b-B-2-1402397300-1402416535/cf_1:qf_2/1407908210526/Put/vlen=10/mvcc=0


As the same location,but i got difference tree nodes? i think it's abnormal...

> Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING
> --------------------------------------------------------------
>
>                 Key: HBASE-11728
>                 URL: https://issues.apache.org/jira/browse/HBASE-11728
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.1.1, 0.98.4
>         Environment: ubuntu12 
> hadoop-2.2.0
> Hbase-0.96.1.1
> SUN-JDK(1.7.0_06-b24)
>            Reporter: wuchengzhi
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.99.0, 2.0.0, 0.98.6
>
>         Attachments: TestPrefixTree.java
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In Scan case, i prepare some data as beflow:
> Table Desc (Using the prefix-tree encoding) :
> 'prefix_tree_test', {NAME => 'cf_1', DATA_BLOCK_ENCODING => 'PREFIX_TREE', 
> TTL => '15552000'}
> and i put 5 rows as:
> (RowKey , Qualifier, Value)
> 'a-b-0-0', 'qf_1', 'c1-value'
> 'a-b-A-1', 'qf_1', 'c1-value'
> 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value'
> 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
> 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3'
> so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the 
> corret result:
> Test 1: 
> Scan scan = new Scan();
> scan.setStartRow("a-b-A-1".getBytes());
> scan.setStopRow("a-b-A-1:".getBytes());
> ------------------------------------------------------
> 'a-b-A-1', 'qf_1', 'c1-value'
> 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value'
> 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
> and then i try next , scan to addColumn
> Test2:
> Scan scan = new Scan();
> scan.addColumn(Bytes.toBytes("cf_1") ,  Bytes.toBytes("qf_2"));
> scan.setStartRow("a-b-A-1".getBytes());
> scan.setStopRow("a-b-A-1:".getBytes());
> ----------------------------------------------
> except:
> 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value'
> 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
> but actually i got nonthing. Then i update the addColumn for 
> scan.addColumn(Bytes.toBytes("cf_1") ,  Bytes.toBytes("qf_1")); and i got the 
> expected result 'a-b-A-1', 'qf_1', 'c1-value' as well.
> then i do more testing...  i update the case to modify the startRow greater 
> than the 'a-b-A-1' 
> Test3:
> Scan scan = new Scan();
> scan.setStartRow("a-b-A-1-".getBytes());
> scan.setStopRow("a-b-A-1:".getBytes());
> ------------------------------------------------------
> except:
> 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value'
> 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
> but actually  i got nothing again. i modify the start row greater than 
> 'a-b-A-1-1402329600-1402396277'
> Scan scan = new Scan();
> scan.setStartRow("a-b-A-1-140239".getBytes());
> scan.setStopRow("a-b-A-1:".getBytes());
> and i got the expect row as well:
> 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
> So, i think it may be a bug in the prefix-tree encoding.It happens after the 
> data flush to the storefile, and it's ok when the data in mem-store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11728) Some data miss when scan using PREFIX_TREE DATA-BLOCK-ENCODING

Reply via email to