[ 
https://issues.apache.org/jira/browse/HBASE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289031#comment-13289031
 ] 

Kannan Muthukkaruppan commented on HBASE-6014:
----------------------------------------------

@Todd:

Some early questions I have:

1) A bit per block may not be very effective in many cases.. e.g., in the "mark 
rows as dirty" example in your description, suppose each HFileBlock has at 
least one dirty KV, then no blocks will get pruned. Similarly, many classic 
cases, like say state names, it is quite possible that every block contains 
almost every state. So the use of the feature will be limited for really narrow 
selectivity-- where we expect only a small % of the blocks in the file to 
contain the data of interest. Is this is the model/use case you are targeting? 
[Just want to make sure.]

2) Also, regarding <<< metadata block included in the HFile which has a series 
of tuples: (qualifier, value range, compressed bitmap).>>>. Could you clarify 
what the "value range" is about? For the "enum" type use, the tuples will be 
"qualifier, enum for value, compressed bitmap", isn't it? and one such tuple 
per block for each enum, correct?  Is the "value range" for cases where say you 
want to query the column value by range (e.g., say temperature). And is the 
idea to slice the range of values for the column (say temperatures) into 
sub-ranges and have a bitmap per range, thus allowing users to do range queries 
by consulting the appropriate bit maps. 




                
> Support for block-granularity bitmap indexes
> --------------------------------------------
>
>                 Key: HBASE-6014
>                 URL: https://issues.apache.org/jira/browse/HBASE-6014
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Todd Lipcon
>         Attachments: 6014-bitmap-hacking.txt, bitmap-hacking.txt
>
>
> This came up in a discussion with Kannan today, so I promised to write 
> something brief on JIRA -- this was suggested as a potential summer intern 
> project. The idea is as follows:
> We have several customers who periodically run full table scan MR jobs 
> against large HBase tables while applying fairly restrictive predicates. The 
> predicates are often reasonably simple boolean expressions across known 
> columns, and those columns often are enum-typed or otherwise have a fairly 
> restricted range of values. For example, a real time process may mark rows as 
> dirty, and a background MR job may scan for dirty rows in order to perform 
> further processing like rebuilding inverted indexes.
> One way to speed up this type of query is to add bitmap indexes. In the 
> context of HBase, I would envision this as a new type of metadata block 
> included in the HFile which has a series of tuples: (qualifier, value range, 
> compressed bitmap). A 1 bit in the bitmap indicates that the corresponding 
> HFile block has at least one cell for which a column with the given qualifier 
> falls within the given range. Queries which have an equality or comparison 
> predicate against an indexed qualifier can then use the bitmap index to seek 
> directly to those blocks which may contain relevant data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to