Thanks Daniel for the recommendation. Let's have a try :) Li Yang <[email protected]>于2015年9月18日周五 下午3:28写道:
> Many thanks Daniel for the information! > > We will definitely give RoaringBitmap > <https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark and > Druid.io, it can help Kylin's inverted index as well. :-) > > On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <[email protected]> wrote: > > > Good day, > > > > I see that Kylin isusing ConciseSet for bitmap indexing. In our work, we > > found that Roaring bitmaps are often much better than ConciseSet (e.g., > see > > experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The > > compression is often better and the speed difference can be substantial. > > This is even more so with version 0.5 of Roaring. > > > > We have a high quality Java implementation that is used by Apache Spark > > and Druid.io. The Druid people found that switching to Roaring bitmaps > > could improve real-world performance by 30% or more. > > > > https://github.com/lemire/RoaringBitmap/ > > > > When desired, the library supports memory file mapping, so that > > out-of-JVM heap memory is used instead. This can greatly improve IO > > issues. The library is available under the Apache license and is > > patent-free. > > > > We have an extensive real-data benchmark framework which you can run for > > yourself to compare Roaring with competitive alternatives such as > > ConciseSet : > > > > https://github.com/lemire/RoaringBitmap/tree/master/jmh > > > > Running such a benchmark can be as simple as launching a script. > > > > What we did for Druid is to make the bitmap format "pluggable" so you can > > switch from one format to the other using a configuration flag. This is > > implemented through simple wrappers, e.g., see > > > > > > > https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap > > > > So it can be really easy to make it possible to switch the format while > > preserving backward compatibility if needed... > > > > If you are interested in trying out Roaring, please let us know... We do > > not think it is difficult work to integrate Roaring in Kylin (maybe a day > > or so of programming) and it could potentially improve performance while > > reducing memory storage. > > > > Note: Roaring bitmaps were also adopted in Apache Lucene, though they > have > > their own implementation, see > > https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps > > > > -- > > Daniel Lemire for the Roaring team > > https://github.com/lemire/ > > > > >
