Daniel Lemire created KYLIN-1034:
------------------------------------

             Summary: Faster bitmap indexes with Roaring bitmaps
                 Key: KYLIN-1034
                 URL: https://issues.apache.org/jira/browse/KYLIN-1034
             Project: Kylin
          Issue Type: Improvement
          Components: General
    Affects Versions: Future
            Reporter: Daniel Lemire


Kylin is using ConciseSet for bitmap indexing. It was found that Roaring 
bitmaps are often much better than ConciseSet (e.g., see experimental section 
in http://arxiv.org/pdf/1402.6407.pdf ). The compression is often better and 
the speed difference can be substantial. This is even more so with version 0.5 
of Roaring.

There is a high quality Java implementation that is used by Apache Spark and 
Druid.io. The Druid people found that switching to Roaring bitmaps could 
improve real-world performance by 30% or more. 

Source code: 
https://github.com/lemire/RoaringBitmap/

Importing from Maven:

http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/

<dependencies>
    <dependency>
      <groupId>org.roaringbitmap</groupId>
      <artifactId>RoaringBitmap</artifactId>
      <version>[0.5,)</version>
    </dependency>
 </dependencies>

Online API :

http://lemire.me/docs/RoaringBitmap/

JavaDoc Jar: 

http://central.maven.org/maven2/org/roaringbitmap/RoaringBitmap/0.5.3/RoaringBitmap-0.5.3-javadoc.jar



When desired, the library supports memory file mapping, so that  out-of-JVM 
heap memory is used instead. This can greatly improve IO issues. The library is 
available under the Apache license and is patent-free.

There is an extensive real-data benchmark framework which you can run for 
yourself to compare Roaring with competitive alternatives such as ConciseSet :

https://github.com/lemire/RoaringBitmap/tree/master/jmh

Running such a benchmark can be as simple as launching a script.

For Druid, the bitmap format was made "pluggable" so you can switch from one 
format to the other using a configuration flag. This is implemented through 
simple wrappers, e.g., see

https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap

So it can be really easy to make it possible to switch the format while 
preserving backward compatibility if needed... 

It is probably not difficult work to integrate Roaring in Kylin (maybe a day or 
so of programming) and it could potentially improve performance while reducing 
memory storage.

Note: Roaring bitmaps were also adopted in Apache Lucene, though they have 
their own implementation, see 
https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to