[jira] Commented: (HBASE-2426) [Transactional Contrib] Introduce quick scanning row-based secondary indexes

George P. Stathis (JIRA) Thu, 15 Apr 2010 15:07:12 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857565#action_12857565
 ]


George P. Stathis commented on HBASE-2426:
------------------------------------------

Clint,

Thank you for the feedback. After more consideration during this past week and 
after reviewing your comment, I think that I should retract any speed 
improvement claims regarding this patch. I started on this contrib focusing 
solely on scans but I naively neglected to account for the column iterations. 
If one takes everything in to account, I do agree that there should not be much 
of a speed difference between what's already there and the 
RowBasedIndexSpecification. Maybe if one thinks in terms of IO, fetching a row 
and then iterating in memory instead of scanning through files might have an 
edge, but I'm not quite sure about this either; I'm still new with this 
technology stack and I'm not sure if scanning through more rows means going 
though more files. Some actual  performance tests should be run to see if that 
statement even holds (or someone more knowledgeable like you should set me 
straight :-) ). 

So, at the very least, the JavaDoc should be amended to reflect this.

As it turns out though, this contrib is definitely useful when used in 
conjunction with https://issues.apache.org/jira/browse/HBASE-2438. Since there 
is currently no reliable way to paginate through rows, a row based indexing 
approach can at least guarantee that the pages returned contain the number of 
rows requested. Our application does leverage pagination, so we will be able to 
use this, at least until a reliable row-based pagination comes along. After 
that, it may be six and half a dozen. One thing that the new contrib does not 
offer over the current solution is the ability to store additional column 
values in the index for further filtering. This might be a deal-breaker for 
some folks.

Let me know what you think. If people don't have any use for this except for 
column-based pagination, maybe it's not worth adding.

> [Transactional Contrib] Introduce quick scanning row-based secondary indexes
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2426
>                 URL: https://issues.apache.org/jira/browse/HBASE-2426
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: contrib
>            Reporter: George P. Stathis
>            Priority: Minor
>             Fix For: 0.20.5, 0.21.0
>
>         Attachments: hbase-2426-0.20-branch.patch
>
>
> RowBasedIndexSpecification is a specialized IndexSpecification class for 
> creating row-based secondary index tables. Base table rows with the same 
> indexed column value have their row keys stored as column qualifiers on the 
> same secondary index table row. The key for that row is the indexed column 
> value from the base table. This allows to avoid expensive secondary index 
> table scans and provides faster access for applications such as foreign key 
> indexing or queries such as "find all table A rows whose familyA:columnB 
> value is X". RowBasedIndexSpecification indices can be scanned using the API 
> on RowBasedIndexedTable. The metadata for RowBasedIndexSpecification differ 
> from IndexSpecification in that:
> - Only a single base table column can be indexed per 
> RowBasedIndexSpecification. No additional columns are put in the index table.
> and 
> - RowBasedIndexKeyGenerator, which constructs the index-row-key from the 
> indexed column value in the original column, is always used.
> For a simple RowBasedIndexSpecification example, look at the 
> TestRowBasedIndexedTable unit test in 
> org.apache.hadoop.hbase.client.tableIndexed.
> To enable RowBasedIndexSpecification indexing, modify hbase-site.xml to turn 
> on the
> IndexedRegionServer.  This is done by setting
> - hbase.regionserver.class to 
> org.apache.hadoop.hbase.ipc.IndexedRegionInterface and
> - hbase.regionserver.impl to 
> org.apache.hadoop.hbase.regionserver.tableindexed.RowBasedIndexedRegionServer

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-2426) [Transactional Contrib] Introduce quick scanning row-based secondary indexes

Reply via email to