Thanks Julian. I'm an even greater fan of your work.
On Sat, Sep 19, 2015 at 7:35 PM, Julian Hyde <[email protected]> wrote: > Thanks for the suggestion, Daniel. I'm an admirer of your work. > > Julian > > On Fri, Sep 18, 2015 at 6:43 AM, Daniel Lemire <[email protected]> wrote: > > Good day, > > > > I posted an issue on Kylin JIRA : > > > > https://issues.apache.org/jira/browse/KYLIN-1034 > > > > It is not very hard to replace ConciseSet by a RoaringBitmap... there are > > API differences, but it all comes down to computing unions, > differences... > > and so forth. There are no fundamental differences. I am willing to > > contribute code and help in other ways. > > > > There are architectural issues that might be interesting to discuss > before > > any code gets written. For example, are your bitmaps stored on disk, or > are > > they strictly in-memory? If they are stored on disk and immutable, then I > > suggest that it might be interesting to consider memory mapping. This > > avoids the expensive process of loading up the data. I see that you seem > to > > bring back bitmaps from bytes... > > > > Memory-mapped bitmaps are slightly slower because they go through the > > abstraction of a ByteBuffer, but saving on the IO and data manipulation > is > > a big deal too. > > > > > > > > > > On Fri, Sep 18, 2015 at 6:50 AM, Luke Han <[email protected]> wrote: > > > >> Hi Daniel, > >> Thank you very much to let's know Roaring bitmaps, it looks > promising > >> with better performance. > >> We are really would like to have it in our project, as you said the > >> efforts should not too much, > >> May I ask you a favor to bring it into our source code? We could work > >> together to make it work for > >> our current cases and then run some benchmark with real case. > >> > >> Please feel free to open Kylin JIRA > >> <https://issues.apache.org/jira/browse/KYLIN> and put necessary > >> information there. > >> Thank you very much and looking forward for your contributions if > you > >> have time to help:) > >> > >> Thanks. > >> > >> Luke > >> > >> > >> > >> Best Regards! > >> --------------------- > >> > >> Luke Han > >> > >> On Fri, Sep 18, 2015 at 4:08 PM, 周千昊 <[email protected]> wrote: > >> > >>> Thanks Daniel for the recommendation. > >>> Let's have a try :) > >>> > >>> Li Yang <[email protected]>于2015年9月18日周五 下午3:28写道: > >>> > >>> > Many thanks Daniel for the information! > >>> > > >>> > We will definitely give RoaringBitmap > >>> > <https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark > >>> and > >>> > Druid.io, it can help Kylin's inverted index as well. :-) > >>> > > >>> > On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <[email protected]> > >>> wrote: > >>> > > >>> > > Good day, > >>> > > > >>> > > I see that Kylin isusing ConciseSet for bitmap indexing. In our > work, > >>> we > >>> > > found that Roaring bitmaps are often much better than ConciseSet > >>> (e.g., > >>> > see > >>> > > experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The > >>> > > compression is often better and the speed difference can be > >>> substantial. > >>> > > This is even more so with version 0.5 of Roaring. > >>> > > > >>> > > We have a high quality Java implementation that is used by Apache > >>> Spark > >>> > > and Druid.io. The Druid people found that switching to Roaring > bitmaps > >>> > > could improve real-world performance by 30% or more. > >>> > > > >>> > > https://github.com/lemire/RoaringBitmap/ > >>> > > > >>> > > When desired, the library supports memory file mapping, so that > >>> > > out-of-JVM heap memory is used instead. This can greatly improve > IO > >>> > > issues. The library is available under the Apache license and is > >>> > > patent-free. > >>> > > > >>> > > We have an extensive real-data benchmark framework which you can > run > >>> for > >>> > > yourself to compare Roaring with competitive alternatives such as > >>> > > ConciseSet : > >>> > > > >>> > > https://github.com/lemire/RoaringBitmap/tree/master/jmh > >>> > > > >>> > > Running such a benchmark can be as simple as launching a script. > >>> > > > >>> > > What we did for Druid is to make the bitmap format "pluggable" so > you > >>> can > >>> > > switch from one format to the other using a configuration flag. > This > >>> is > >>> > > implemented through simple wrappers, e.g., see > >>> > > > >>> > > > >>> > > > >>> > > >>> > https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap > >>> > > > >>> > > So it can be really easy to make it possible to switch the format > >>> while > >>> > > preserving backward compatibility if needed... > >>> > > > >>> > > If you are interested in trying out Roaring, please let us know... > We > >>> do > >>> > > not think it is difficult work to integrate Roaring in Kylin > (maybe a > >>> day > >>> > > or so of programming) and it could potentially improve performance > >>> while > >>> > > reducing memory storage. > >>> > > > >>> > > Note: Roaring bitmaps were also adopted in Apache Lucene, though > they > >>> > have > >>> > > their own implementation, see > >>> > > https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps > >>> > > > >>> > > -- > >>> > > Daniel Lemire for the Roaring team > >>> > > https://github.com/lemire/ > >>> > > > >>> > > > >>> > > >>> > >> > >> >
