Good day,

I see that Kylin isusing ConciseSet for bitmap indexing. In our work, we
found that Roaring bitmaps are often much better than ConciseSet (e.g., see
experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The
compression is often better and the speed difference can be substantial.
This is even more so with version 0.5 of Roaring.

We have a high quality Java implementation that is used by Apache Spark and
Druid.io. The Druid people found that switching to Roaring bitmaps could
improve real-world performance by 30% or more.

https://github.com/lemire/RoaringBitmap/

When desired, the library supports memory file mapping, so that  out-of-JVM
heap memory is used instead. This can greatly improve IO issues. The
library is available under the Apache license and is patent-free.

We have an extensive real-data benchmark framework which you can run for
yourself to compare Roaring with competitive alternatives such as
ConciseSet :

https://github.com/lemire/RoaringBitmap/tree/master/jmh

Running such a benchmark can be as simple as launching a script.

What we did for Druid is to make the bitmap format "pluggable" so you can
switch from one format to the other using a configuration flag. This is
implemented through simple wrappers, e.g., see

https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap

So it can be really easy to make it possible to switch the format while
preserving backward compatibility if needed...

If you are interested in trying out Roaring, please let us know... We do
not think it is difficult work to integrate Roaring in Kylin (maybe a day
or so of programming) and it could potentially improve performance while
reducing memory storage.

Note: Roaring bitmaps were also adopted in Apache Lucene, though they have
their own implementation, see
https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps

--
 Daniel Lemire for the Roaring team
https://github.com/lemire/

Reply via email to