[
https://issues.apache.org/jira/browse/HBASE-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721840#comment-13721840
]
stack commented on HBASE-8980:
------------------------------
Does this have to be in core hbase? Can it be done as a decoration? As a
coprocessor?
> Assistant Store ----------- An Index Store of HRegion
> -----------------------------------------------------
>
> Key: HBASE-8980
> URL: https://issues.apache.org/jira/browse/HBASE-8980
> Project: HBase
> Issue Type: New Feature
> Components: regionserver
> Reporter: chunhui shen
> Assignee: chunhui shen
> Attachments: 8980-94.patch
>
>
> *Background*
> a.Generally, we would hope several organizations for the same data. e.g.
> Secondary Index sortes the data as the non-primary key.
> b.Now, when we scanning the data on HBase with condition, like ValueFilter,
> its efficiency seems low
> c.We could create an Assistant Store to store the data with another
> organization for the data of HRegion
> *Assistant Store*
> a.It's a store of HRegion, like HStore, could be created by user through
> adding ColumnFamliy
> b.Data in Assistant Store is the copy of data in HRegion, but using another
> organization ,The Exception is that its row could be not in the range of
> HRegion and its value is the same as the row of original KeyValue
> For example,
> The region(Range:'row001'~'row999') includes the following KVs in the Store
> cf:
> row001/cf:q1/val001
> row002/cf:q1/val002
> row003/cf:q1/val003
> we could create an Assistant Store(named as) for the region which includes
> the following KVs:
> val001/cf:q1/row001
> val002/cf:q1/row002
> val003/cf:q1/row003
> c.We could use local region transaction to ensure the Atomicity and
> Consistency
> e.Regionserver will put data into Assistant Store automatically, but user
> should read the data from Assistant Store himself
> *Example of Using Assistant Store*
> a.Supposing exist the empty table named t1 with the column family named c1,
> it has only one region (region's range is from EMPTY_START_ROW to
> EMPTY_END_ROW).
> b.Adding an Assistant Store for the table through adding a new column family
> named c2.
> c.User put following data to table:
> r1/c1:q1/v1
> r2/c1:q1/v2
> r3/c1:q1/v1
> r4/c1:q1/v2
> r5/c1:q1/v1
> r6/c1:q1/v2
> d.Then, the region will have the following data:
> r1/c1:q1/v1
> r2/c1:q1/v2
> r3/c1:q1/v1
> r4/c1:q1/v2
> r5/c1:q1/v1
> r6/c1:q1/v2
> v1/c2:q1/r1
> v1/c2:q1/r3
> v1/c2:q1/r5
> v2/c2:q1/r2 (Generated by Assistant, Stored in Assistant Store)
> v2/c2:q1/r4
> v2/c2:q1/r6
> e.Splitting the region into daughter_a and daughter_b with the split poit
> 'r4',
> then the daughter_a has the following data:
> r1/c1:q1/v1
> r2/c1:q1/v2
> r3/c1:q1/v1
> v1/c2:q1/r1
> v1/c2:q1/r3 (Data in Assistant Store)
> v2/c2:q1/r2
> the daughter_b has the following data:
> r4/c1:q1/v2
> r5/c1:q1/v1
> r6/c1:q1/v2
> v1/c2:q1/r5
> v2/c2:q1/r4(Data in Assistant Store)
> v2/c2:q1/r6
> f.From the above, we could see that the data in Assistant Store is always
> corresponding to the original data in Region, its data is maintained by
> regionserver.
> g.How to use the data in Assistant Store?
> Suppose we want to do a scan from 'r1' to 'r7' with the ValueFilter value =
> 'v2',
> We must scan the whole table without Assistant Store.
> But now we could use Assistant Store to speed up scanning:
> Take a scan on Assistant Store from 'v2' to 'v2+', and get the following
> result:
> v2/c2:q1/r2
> v2/c2:q1/r4
> v2/c2:q1/r6
> Unfortunately, the scan result may not be ordered by row nor value, but be
> able to make it ordered by value.
> From the code view, I design the scan on Assistant Store as following:
> {code}
> //Limit the scan range from the row
> Scan scan = new Scan();
> scan.setStartRow('r1');
> scan.setStopRow('r7');
> //Do the scan on Assistant Store
> Scan assistantScan = new
> Scan().setStartRow('v2').setStopRow('v2'+'(byte)0x00');
> scan.setAssistantScan(assistantScan);//After setting this, region will run
> the scan with the assistant Scan
> scanner = htable.getScanner(scan);
> for(Result result:scanner){
> //out put
> v2/c2:q1/r2
> v2/c2:q1/r4
> v2/c2:q1/r6
> }
> {code}
> *Implementation Dependency*
> a.Split the StoreFile as value.(Now,we just split the file as row)
> b.Support multi-row transaction in region (Alreadt implemented)
> Providing an initial patch on 0.94 version.
> What do you think about such a Store.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira