Thanks Julian. I'm an even greater fan of your work.





On Sat, Sep 19, 2015 at 7:35 PM, Julian Hyde <[email protected]> wrote:

> Thanks for the suggestion, Daniel. I'm an admirer of your work.
>
> Julian
>
> On Fri, Sep 18, 2015 at 6:43 AM, Daniel Lemire <[email protected]> wrote:
> > Good day,
> >
> > I posted an issue on Kylin JIRA :
> >
> > https://issues.apache.org/jira/browse/KYLIN-1034
> >
> > It is not very hard to replace ConciseSet by a RoaringBitmap... there are
> > API differences, but it all comes down to computing unions,
> differences...
> > and so forth. There are no fundamental differences. I am willing to
> > contribute code and help in other ways.
> >
> > There are architectural issues that might be interesting to discuss
> before
> > any code gets written. For example, are your bitmaps stored on disk, or
> are
> > they strictly in-memory? If they are stored on disk and immutable, then I
> > suggest that it might be interesting to consider memory mapping. This
> > avoids the expensive process of loading up the data. I see that you seem
> to
> > bring back bitmaps from bytes...
> >
> > Memory-mapped bitmaps are slightly slower because they go through the
> > abstraction of a ByteBuffer, but saving on the IO and data manipulation
> is
> > a big deal too.
> >
> >
> >
> >
> > On Fri, Sep 18, 2015 at 6:50 AM, Luke Han <[email protected]> wrote:
> >
> >> Hi Daniel,
> >>     Thank you very much to let's know Roaring bitmaps, it looks
> promising
> >> with better performance.
> >>     We are really would like to have it in our project, as you said the
> >> efforts should not too much,
> >> May I ask you a favor to bring it into our source code? We could work
> >> together to make it work for
> >> our current cases and then run some benchmark with real case.
> >>
> >>     Please feel free to open Kylin JIRA
> >> <https://issues.apache.org/jira/browse/KYLIN> and put necessary
> >> information there.
> >>     Thank you very much and looking forward for your contributions if
> you
> >> have time to help:)
> >>
> >>      Thanks.
> >>
> >> Luke
> >>
> >>
> >>
> >> Best Regards!
> >> ---------------------
> >>
> >> Luke Han
> >>
> >> On Fri, Sep 18, 2015 at 4:08 PM, 周千昊 <[email protected]> wrote:
> >>
> >>> Thanks Daniel for the recommendation.
> >>> Let's have a try :)
> >>>
> >>> Li Yang <[email protected]>于2015年9月18日周五 下午3:28写道:
> >>>
> >>> > Many thanks Daniel for the information!
> >>> >
> >>> > We will definitely give RoaringBitmap
> >>> > <https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark
> >>> and
> >>> > Druid.io, it can help Kylin's inverted index as well.  :-)
> >>> >
> >>> > On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <[email protected]>
> >>> wrote:
> >>> >
> >>> > > Good day,
> >>> > >
> >>> > > I see that Kylin isusing ConciseSet for bitmap indexing. In our
> work,
> >>> we
> >>> > > found that Roaring bitmaps are often much better than ConciseSet
> >>> (e.g.,
> >>> > see
> >>> > > experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The
> >>> > > compression is often better and the speed difference can be
> >>> substantial.
> >>> > > This is even more so with version 0.5 of Roaring.
> >>> > >
> >>> > > We have a high quality Java implementation that is used by Apache
> >>> Spark
> >>> > > and Druid.io. The Druid people found that switching to Roaring
> bitmaps
> >>> > > could improve real-world performance by 30% or more.
> >>> > >
> >>> > > https://github.com/lemire/RoaringBitmap/
> >>> > >
> >>> > > When desired, the library supports memory file mapping, so that
> >>> > >  out-of-JVM heap memory is used instead. This can greatly improve
> IO
> >>> > > issues. The library is available under the Apache license and is
> >>> > > patent-free.
> >>> > >
> >>> > > We have an extensive real-data benchmark framework which you can
> run
> >>> for
> >>> > > yourself to compare Roaring with competitive alternatives such as
> >>> > > ConciseSet :
> >>> > >
> >>> > > https://github.com/lemire/RoaringBitmap/tree/master/jmh
> >>> > >
> >>> > > Running such a benchmark can be as simple as launching a script.
> >>> > >
> >>> > > What we did for Druid is to make the bitmap format "pluggable" so
> you
> >>> can
> >>> > > switch from one format to the other using a configuration flag.
> This
> >>> is
> >>> > > implemented through simple wrappers, e.g., see
> >>> > >
> >>> > >
> >>> > >
> >>> >
> >>>
> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
> >>> > >
> >>> > > So it can be really easy to make it possible to switch the format
> >>> while
> >>> > > preserving backward compatibility if needed...
> >>> > >
> >>> > > If you are interested in trying out Roaring, please let us know...
> We
> >>> do
> >>> > > not think it is difficult work to integrate Roaring in Kylin
> (maybe a
> >>> day
> >>> > > or so of programming) and it could potentially improve performance
> >>> while
> >>> > > reducing memory storage.
> >>> > >
> >>> > > Note: Roaring bitmaps were also adopted in Apache Lucene, though
> they
> >>> > have
> >>> > > their own implementation, see
> >>> > > https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
> >>> > >
> >>> > > --
> >>> > >  Daniel Lemire for the Roaring team
> >>> > > https://github.com/lemire/
> >>> > >
> >>> > >
> >>> >
> >>>
> >>
> >>
>

Reply via email to