Thanks for the suggestion, Daniel. I'm an admirer of your work.

Julian

On Fri, Sep 18, 2015 at 6:43 AM, Daniel Lemire <[email protected]> wrote:
> Good day,
>
> I posted an issue on Kylin JIRA :
>
> https://issues.apache.org/jira/browse/KYLIN-1034
>
> It is not very hard to replace ConciseSet by a RoaringBitmap... there are
> API differences, but it all comes down to computing unions, differences...
> and so forth. There are no fundamental differences. I am willing to
> contribute code and help in other ways.
>
> There are architectural issues that might be interesting to discuss before
> any code gets written. For example, are your bitmaps stored on disk, or are
> they strictly in-memory? If they are stored on disk and immutable, then I
> suggest that it might be interesting to consider memory mapping. This
> avoids the expensive process of loading up the data. I see that you seem to
> bring back bitmaps from bytes...
>
> Memory-mapped bitmaps are slightly slower because they go through the
> abstraction of a ByteBuffer, but saving on the IO and data manipulation is
> a big deal too.
>
>
>
>
> On Fri, Sep 18, 2015 at 6:50 AM, Luke Han <[email protected]> wrote:
>
>> Hi Daniel,
>>     Thank you very much to let's know Roaring bitmaps, it looks promising
>> with better performance.
>>     We are really would like to have it in our project, as you said the
>> efforts should not too much,
>> May I ask you a favor to bring it into our source code? We could work
>> together to make it work for
>> our current cases and then run some benchmark with real case.
>>
>>     Please feel free to open Kylin JIRA
>> <https://issues.apache.org/jira/browse/KYLIN> and put necessary
>> information there.
>>     Thank you very much and looking forward for your contributions if you
>> have time to help:)
>>
>>      Thanks.
>>
>> Luke
>>
>>
>>
>> Best Regards!
>> ---------------------
>>
>> Luke Han
>>
>> On Fri, Sep 18, 2015 at 4:08 PM, 周千昊 <[email protected]> wrote:
>>
>>> Thanks Daniel for the recommendation.
>>> Let's have a try :)
>>>
>>> Li Yang <[email protected]>于2015年9月18日周五 下午3:28写道:
>>>
>>> > Many thanks Daniel for the information!
>>> >
>>> > We will definitely give RoaringBitmap
>>> > <https://github.com/lemire/RoaringBitmap/> a try. If it boosts Spark
>>> and
>>> > Druid.io, it can help Kylin's inverted index as well.  :-)
>>> >
>>> > On Thu, Sep 17, 2015 at 11:16 PM, Daniel Lemire <[email protected]>
>>> wrote:
>>> >
>>> > > Good day,
>>> > >
>>> > > I see that Kylin isusing ConciseSet for bitmap indexing. In our work,
>>> we
>>> > > found that Roaring bitmaps are often much better than ConciseSet
>>> (e.g.,
>>> > see
>>> > > experimental section in http://arxiv.org/pdf/1402.6407.pdf ). The
>>> > > compression is often better and the speed difference can be
>>> substantial.
>>> > > This is even more so with version 0.5 of Roaring.
>>> > >
>>> > > We have a high quality Java implementation that is used by Apache
>>> Spark
>>> > > and Druid.io. The Druid people found that switching to Roaring bitmaps
>>> > > could improve real-world performance by 30% or more.
>>> > >
>>> > > https://github.com/lemire/RoaringBitmap/
>>> > >
>>> > > When desired, the library supports memory file mapping, so that
>>> > >  out-of-JVM heap memory is used instead. This can greatly improve IO
>>> > > issues. The library is available under the Apache license and is
>>> > > patent-free.
>>> > >
>>> > > We have an extensive real-data benchmark framework which you can run
>>> for
>>> > > yourself to compare Roaring with competitive alternatives such as
>>> > > ConciseSet :
>>> > >
>>> > > https://github.com/lemire/RoaringBitmap/tree/master/jmh
>>> > >
>>> > > Running such a benchmark can be as simple as launching a script.
>>> > >
>>> > > What we did for Druid is to make the bitmap format "pluggable" so you
>>> can
>>> > > switch from one format to the other using a configuration flag. This
>>> is
>>> > > implemented through simple wrappers, e.g., see
>>> > >
>>> > >
>>> > >
>>> >
>>> https://github.com/metamx/bytebuffer-collections/tree/master/src/main/java/com/metamx/collections/bitmap
>>> > >
>>> > > So it can be really easy to make it possible to switch the format
>>> while
>>> > > preserving backward compatibility if needed...
>>> > >
>>> > > If you are interested in trying out Roaring, please let us know... We
>>> do
>>> > > not think it is difficult work to integrate Roaring in Kylin (maybe a
>>> day
>>> > > or so of programming) and it could potentially improve performance
>>> while
>>> > > reducing memory storage.
>>> > >
>>> > > Note: Roaring bitmaps were also adopted in Apache Lucene, though they
>>> > have
>>> > > their own implementation, see
>>> > > https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps
>>> > >
>>> > > --
>>> > >  Daniel Lemire for the Roaring team
>>> > > https://github.com/lemire/
>>> > >
>>> > >
>>> >
>>>
>>
>>

Reply via email to