[ 
https://issues.apache.org/jira/browse/HBASE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183104#comment-15183104
 ] 

Daniel Lemire commented on HBASE-6014:
--------------------------------------

A good choice for this might be Roaring bitmaps (http://roaringbitmap.org/). 
They are used by Apache Spark, Druid, Apache Kylin, Apache Lucene and so forth. 
No patent, Apache license.

> Support for block-granularity bitmap indexes
> --------------------------------------------
>
>                 Key: HBASE-6014
>                 URL: https://issues.apache.org/jira/browse/HBASE-6014
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Todd Lipcon
>         Attachments: 6014-bitmap-hacking.txt, bitmap-hacking.txt
>
>
> This came up in a discussion with Kannan today, so I promised to write 
> something brief on JIRA -- this was suggested as a potential summer intern 
> project. The idea is as follows:
> We have several customers who periodically run full table scan MR jobs 
> against large HBase tables while applying fairly restrictive predicates. The 
> predicates are often reasonably simple boolean expressions across known 
> columns, and those columns often are enum-typed or otherwise have a fairly 
> restricted range of values. For example, a real time process may mark rows as 
> dirty, and a background MR job may scan for dirty rows in order to perform 
> further processing like rebuilding inverted indexes.
> One way to speed up this type of query is to add bitmap indexes. In the 
> context of HBase, I would envision this as a new type of metadata block 
> included in the HFile which has a series of tuples: (qualifier, value range, 
> compressed bitmap). A 1 bit in the bitmap indicates that the corresponding 
> HFile block has at least one cell for which a column with the given qualifier 
> falls within the given range. Queries which have an equality or comparison 
> predicate against an indexed qualifier can then use the bitmap index to seek 
> directly to those blocks which may contain relevant data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to