Thanks for the suggestion, Daniel. I'm an admirer of your work. Julian
On Fri, Sep 18, 2015 at 6:43 AM, Daniel Lemire <[email protected]> wrote: > Good day, > > I posted an issue on Kylin JIRA : > > https://issues.apache.org/jira/browse/KYLIN-1034 > > It is not very hard to replace ConciseSet by a RoaringBitmap... there are > API differences, but it all comes down to computing unions, differences... > and so forth. There are no fundamental differences. I am willing to > contribute code and help in other ways. > > There are architectural issues that might be interesting to discuss before > any code gets written. For example, are your bitmaps stored on disk, or are > they strictly in-memory? If they are stored on disk and immutable, then I > suggest that it might be interesting to consider memory mapping. This > avoids the expensive process of loading up the data. I see that you seem to > bring back bitmaps from bytes... > > Memory-mapped bitmaps are slightly slower because they go through the > abstraction of a ByteBuffer, but saving on the IO and data manipulation is > a big deal too. > > > > > On Fri, Sep 18, 2015 at 6:50 AM, Luke Han <[email protected]> wrote: > >> Hi Daniel, >> Thank you very much to let's know Roaring bitmaps, it looks promising >> with better performance. >> We are really would like to have it in our project, as you said the >> efforts should not too much, >> May I ask you a favor to bring it into our source code? We could work >> together to make it work for >> our current cases and then run some benchmark with real case. >> >> Please feel free to open Kylin JIRA >> <https://issues.apache.org/jira/browse/KYLIN> and put necessary >> information there. >> Thank you very much and looking forward for your contributions if you >> have time to help:) >> >> Thanks. >> >> Luke >> >> >> >> Best Regards! >> --------------------- >> >> Luke Han >> >> On Fri, Sep 18, 2015 at 4:08 PM, 周千昊 <[email protected]> wrote: >> >>> Thanks Daniel for the recommendation. >>> Let's have a try :) >>> >>> Li Yang <[email protected]>于2015年9月18日周五 下午3:28写道: >>> >>> > Many thanks Daniel for the information! >>> > >>> > We will definitely give RoaringBitmap >>> > <https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark >>> and >>> > Druid.io, it can help Kylin's inverted index as well. :-) >>> > >>> > On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <[email protected]> >>> wrote: >>> > >>> > > Good day, >>> > > >>> > > I see that Kylin isusing ConciseSet for bitmap indexing. In our work, >>> we >>> > > found that Roaring bitmaps are often much better than ConciseSet >>> (e.g., >>> > see >>> > > experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The >>> > > compression is often better and the speed difference can be >>> substantial. >>> > > This is even more so with version 0.5 of Roaring. >>> > > >>> > > We have a high quality Java implementation that is used by Apache >>> Spark >>> > > and Druid.io. The Druid people found that switching to Roaring bitmaps >>> > > could improve real-world performance by 30% or more. >>> > > >>> > > https://github.com/lemire/RoaringBitmap/ >>> > > >>> > > When desired, the library supports memory file mapping, so that >>> > > out-of-JVM heap memory is used instead. This can greatly improve IO >>> > > issues. The library is available under the Apache license and is >>> > > patent-free. >>> > > >>> > > We have an extensive real-data benchmark framework which you can run >>> for >>> > > yourself to compare Roaring with competitive alternatives such as >>> > > ConciseSet : >>> > > >>> > > https://github.com/lemire/RoaringBitmap/tree/master/jmh >>> > > >>> > > Running such a benchmark can be as simple as launching a script. >>> > > >>> > > What we did for Druid is to make the bitmap format "pluggable" so you >>> can >>> > > switch from one format to the other using a configuration flag. This >>> is >>> > > implemented through simple wrappers, e.g., see >>> > > >>> > > >>> > > >>> > >>> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap >>> > > >>> > > So it can be really easy to make it possible to switch the format >>> while >>> > > preserving backward compatibility if needed... >>> > > >>> > > If you are interested in trying out Roaring, please let us know... We >>> do >>> > > not think it is difficult work to integrate Roaring in Kylin (maybe a >>> day >>> > > or so of programming) and it could potentially improve performance >>> while >>> > > reducing memory storage. >>> > > >>> > > Note: Roaring bitmaps were also adopted in Apache Lucene, though they >>> > have >>> > > their own implementation, see >>> > > https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps >>> > > >>> > > -- >>> > > Daniel Lemire for the Roaring team >>> > > https://github.com/lemire/ >>> > > >>> > > >>> > >>> >> >>
