Good day, I posted an issue on Kylin JIRA :
https://issues.apache.org/jira/browse/KYLIN-1034 It is not very hard to replace ConciseSet by a RoaringBitmap... there are API differences, but it all comes down to computing unions, differences... and so forth. There are no fundamental differences. I am willing to contribute code and help in other ways. There are architectural issues that might be interesting to discuss before any code gets written. For example, are your bitmaps stored on disk, or are they strictly in-memory? If they are stored on disk and immutable, then I suggest that it might be interesting to consider memory mapping. This avoids the expensive process of loading up the data. I see that you seem to bring back bitmaps from bytes... Memory-mapped bitmaps are slightly slower because they go through the abstraction of a ByteBuffer, but saving on the IO and data manipulation is a big deal too. On Fri, Sep 18, 2015 at 6:50 AM, Luke Han <[email protected]> wrote: > Hi Daniel, > Thank you very much to let's know Roaring bitmaps, it looks promising > with better performance. > We are really would like to have it in our project, as you said the > efforts should not too much, > May I ask you a favor to bring it into our source code? We could work > together to make it work for > our current cases and then run some benchmark with real case. > > Please feel free to open Kylin JIRA > <https://issues.apache.org/jira/browse/KYLIN> and put necessary > information there. > Thank you very much and looking forward for your contributions if you > have time to help:) > > Thanks. > > Luke > > > > Best Regards! > --------------------- > > Luke Han > > On Fri, Sep 18, 2015 at 4:08 PM, 周千昊 <[email protected]> wrote: > >> Thanks Daniel for the recommendation. >> Let's have a try :) >> >> Li Yang <[email protected]>于2015年9月18日周五 下午3:28写道: >> >> > Many thanks Daniel for the information! >> > >> > We will definitely give RoaringBitmap >> > <https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark >> and >> > Druid.io, it can help Kylin's inverted index as well. :-) >> > >> > On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <[email protected]> >> wrote: >> > >> > > Good day, >> > > >> > > I see that Kylin isusing ConciseSet for bitmap indexing. In our work, >> we >> > > found that Roaring bitmaps are often much better than ConciseSet >> (e.g., >> > see >> > > experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The >> > > compression is often better and the speed difference can be >> substantial. >> > > This is even more so with version 0.5 of Roaring. >> > > >> > > We have a high quality Java implementation that is used by Apache >> Spark >> > > and Druid.io. The Druid people found that switching to Roaring bitmaps >> > > could improve real-world performance by 30% or more. >> > > >> > > https://github.com/lemire/RoaringBitmap/ >> > > >> > > When desired, the library supports memory file mapping, so that >> > > out-of-JVM heap memory is used instead. This can greatly improve IO >> > > issues. The library is available under the Apache license and is >> > > patent-free. >> > > >> > > We have an extensive real-data benchmark framework which you can run >> for >> > > yourself to compare Roaring with competitive alternatives such as >> > > ConciseSet : >> > > >> > > https://github.com/lemire/RoaringBitmap/tree/master/jmh >> > > >> > > Running such a benchmark can be as simple as launching a script. >> > > >> > > What we did for Druid is to make the bitmap format "pluggable" so you >> can >> > > switch from one format to the other using a configuration flag. This >> is >> > > implemented through simple wrappers, e.g., see >> > > >> > > >> > > >> > >> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap >> > > >> > > So it can be really easy to make it possible to switch the format >> while >> > > preserving backward compatibility if needed... >> > > >> > > If you are interested in trying out Roaring, please let us know... We >> do >> > > not think it is difficult work to integrate Roaring in Kylin (maybe a >> day >> > > or so of programming) and it could potentially improve performance >> while >> > > reducing memory storage. >> > > >> > > Note: Roaring bitmaps were also adopted in Apache Lucene, though they >> > have >> > > their own implementation, see >> > > https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps >> > > >> > > -- >> > > Daniel Lemire for the Roaring team >> > > https://github.com/lemire/ >> > > >> > > >> > >> > >
