[ 
https://issues.apache.org/jira/browse/HBASE-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713306#comment-13713306
 ] 

chunhui shen commented on HBASE-8980:
-------------------------------------

bq.If a table has multiple column families, how many assistant stores would be 
created ?
Assistant stores are created by user, you could create one or more.

bq.Do you have some performance numbers
I start this work just now...

bq.People may be more interested in seeing how scanning / filtering is done on 
top of the assistant stores.
Adding this in description

bq.If you want to search by value though, you have to scan all regions, right?
Yes, you could consider the region as a sub-table and only the sub-table has 
its index store. 

bq.region is still partitioned by row.
Yes

                
> Assistant Store ----------- An Index Store of HRegion
> -----------------------------------------------------
>
>                 Key: HBASE-8980
>                 URL: https://issues.apache.org/jira/browse/HBASE-8980
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: 8980-94.patch
>
>
> *Background*
> a.Generally, we would hope several organizations for the same data. e.g. 
> Secondary Index sortes the data as the non-primary key.
> b.Now, when we scanning the data on HBase with condition, like ValueFilter, 
> its  efficiency seems low
> c.We could create an Assistant Store to store the data with another 
> organization for the data of HRegion
> *Assistant Store*
> a.It's a store of HRegion, like HStore, could be created by user through 
> adding ColumnFamliy
> b.Data in Assistant Store is the copy of data in HRegion, but using another 
> organization ,The Exception is that its row could be not in the range of 
> HRegion and its value is the same as the row of original KeyValue
> For example, 
> The region(Range:'row001'~'row999') includes the following KVs in the Store 
> cf:
> row001/cf:q1/val001
> row002/cf:q1/val002
> row003/cf:q1/val003
> we could create an Assistant Store(named as) for the region which includes 
> the following KVs:
> val001/cf:q1/row001
> val002/cf:q1/row002
> val003/cf:q1/row003
> c.We could use local region transaction to ensure the Atomicity and 
> Consistency
> e.Regionserver will put data into Assistant Store automatically, but user 
> should read the data from Assistant Store himself
> *Example of Using Assistant Store*
> a.Supposing exist the empty table named t1 with the column family named c1, 
> it has only one region (region's range is from EMPTY_START_ROW to 
> EMPTY_END_ROW).
> b.Adding an Assistant Store for the table through adding a new column family 
> named c2.
> c.User put following data to table:
> r1/c1:q1/v1
> r2/c1:q1/v2
> r3/c1:q1/v1
> r4/c1:q1/v2
> d.Then, the region will have the following data:
> r1/c1:q1/v1
> r2/c1:q1/v2
> r3/c1:q1/v1
> r4/c1:q1/v2
> r5/c1:q1/v1
> r6/c1:q1/v2
> v1/c2:q1/r1
> v1/c2:q1/r3
> v1/c2:q1/r5
> v2/c2:q1/r2 (Generated by Assistant, Stored in Assistant Store)
> v2/c2:q1/r4
> v2/c2:q1/r6
> e.Splitting the region into daughter_a  and daughter_b with the split poit 
> 'r4', 
> then the daughter_a has the following data:
> r1/c1:q1/v1
> r2/c1:q1/v2
> r3/c1:q1/v1
> v1/c2:q1/r1
> v1/c2:q1/r3  (Data in Assistant Store)
> v2/c2:q1/r2
> the daughter_b has the following data:
> r4/c1:q1/v2
> r5/c1:q1/v1
> r6/c1:q1/v2
> v1/c2:q1/r5
> v2/c2:q1/r4(Data in Assistant Store)
> v2/c2:q1/r6
> f.From the above, we could see that the data in Assistant Store is always 
> corresponding to the original data in Region, its data is maintained by 
> regionserver.
> g.How to use the data in Assistant Store? 
> Suppose we want to do a scan from 'r1' to 'r7' with the ValueFilter value = 
> 'v2',
> We must scan the whole table without Assistant Store.
> But now we could use Assistant Store to speed up scanning:
> Take a scan on Assistant Store from 'v2' to 'v2+', and get the following 
> result:
> v2/c2:q1/r2
> v2/c2:q1/r4
> v2/c2:q1/r6
> Unfortunately, the scan result may not be ordered by row nor value, but be 
> able to make it ordered by value.
> From the code view, I design the scan on Assistant Store as following:
> {code}
> //Limit the scan range from the row
> Scan scan = new Scan();
> scan.setStartRow('r1');
> scan.setStopRow('r7');
> //Do the scan on Assistant Store
> Scan assistantScan = new 
> Scan().setStartRow('v2').setStopRow('v2'+'(byte)0x00');
> scan.setAssistantScan(assistantScan);//After setting this, region will run 
> the scan with the assistant Scan
> scanner = htable.getScanner(scan);
> for(Result result:scanner){
> //out put
> v2/c2:q1/r2
> v2/c2:q1/r4
> v2/c2:q1/r6
> }
> {code}
> *Implementation Dependency*
> a.Split the StoreFile as value.(Now,we just split the file as row)
> b.Support multi-row transaction in region (Alreadt implemented)
> Providing an initial patch on 0.94 version. 
> What do you think about such a Store.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to