Many thanks Daniel for the information!

We will definitely give RoaringBitmap
<https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark and
Druid.io, it can help Kylin's inverted index as well.  :-)

On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <[email protected]> wrote:

> Good day,
>
> I see that Kylin isusing ConciseSet for bitmap indexing. In our work, we
> found that Roaring bitmaps are often much better than ConciseSet (e.g., see
> experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The
> compression is often better and the speed difference can be substantial.
> This is even more so with version 0.5 of Roaring.
>
> We have a high quality Java implementation that is used by Apache Spark
> and Druid.io. The Druid people found that switching to Roaring bitmaps
> could improve real-world performance by 30% or more.
>
> https://github.com/lemire/RoaringBitmap/
>
> When desired, the library supports memory file mapping, so that
>  out-of-JVM heap memory is used instead. This can greatly improve IO
> issues. The library is available under the Apache license and is
> patent-free.
>
> We have an extensive real-data benchmark framework which you can run for
> yourself to compare Roaring with competitive alternatives such as
> ConciseSet :
>
> https://github.com/lemire/RoaringBitmap/tree/master/jmh
>
> Running such a benchmark can be as simple as launching a script.
>
> What we did for Druid is to make the bitmap format "pluggable" so you can
> switch from one format to the other using a configuration flag. This is
> implemented through simple wrappers, e.g., see
>
>
> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
>
> So it can be really easy to make it possible to switch the format while
> preserving backward compatibility if needed...
>
> If you are interested in trying out Roaring, please let us know... We do
> not think it is difficult work to integrate Roaring in Kylin (maybe a day
> or so of programming) and it could potentially improve performance while
> reducing memory storage.
>
> Note: Roaring bitmaps were also adopted in Apache Lucene, though they have
> their own implementation, see
> https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
>
> --
>  Daniel Lemire for the Roaring team
> https://github.com/lemire/
>
>

Reply via email to