Many thanks Daniel for the information! We will definitely give RoaringBitmap <https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark and Druid.io, it can help Kylin's inverted index as well. :-)
On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <[email protected]> wrote: > Good day, > > I see that Kylin isusing ConciseSet for bitmap indexing. In our work, we > found that Roaring bitmaps are often much better than ConciseSet (e.g., see > experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The > compression is often better and the speed difference can be substantial. > This is even more so with version 0.5 of Roaring. > > We have a high quality Java implementation that is used by Apache Spark > and Druid.io. The Druid people found that switching to Roaring bitmaps > could improve real-world performance by 30% or more. > > https://github.com/lemire/RoaringBitmap/ > > When desired, the library supports memory file mapping, so that > out-of-JVM heap memory is used instead. This can greatly improve IO > issues. The library is available under the Apache license and is > patent-free. > > We have an extensive real-data benchmark framework which you can run for > yourself to compare Roaring with competitive alternatives such as > ConciseSet : > > https://github.com/lemire/RoaringBitmap/tree/master/jmh > > Running such a benchmark can be as simple as launching a script. > > What we did for Druid is to make the bitmap format "pluggable" so you can > switch from one format to the other using a configuration flag. This is > implemented through simple wrappers, e.g., see > > > https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap > > So it can be really easy to make it possible to switch the format while > preserving backward compatibility if needed... > > If you are interested in trying out Roaring, please let us know... We do > not think it is difficult work to integrate Roaring in Kylin (maybe a day > or so of programming) and it could potentially improve performance while > reducing memory storage. > > Note: Roaring bitmaps were also adopted in Apache Lucene, though they have > their own implementation, see > https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps > > -- > Daniel Lemire for the Roaring team > https://github.com/lemire/ > >
