Good day Luke,

A patch has been added to the JIRA :

https://issues.apache.org/jira/browse/KYLIN-1034

I have also issued a PR on GitHub:

https://github.com/apache/incubator-kylin/pull/12

The patch is straight-forward, and simply replaces Concise by Roaring
throughout.

The relevant unit tests appear to pass.

Further review, testing and benchmarking is encouraged. The purpose of this
patch
is to get the process started.

To keep things simple, I did not do *any* redesign. Still... here are my
thoughts...

Design-wise : It does look to me like the bitmaps are serialized to streams
of bytes.
>From there, *immutable* bitmaps are reloaded on demand, then possibly
copied and modified.
The Roaring library has a class ideally suited for this purpose, called
ImmutableRoaringBitmap...
>From any ByteBuffer, you can map directly a bitmap :
https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java
Compared to deserializing a bitmap from a stream of bytes, this approach
avoids copying
and parsing the data: constructing an ImmutableRoaringBitmap is very fast
and uses very
little memory. Because they are formally immutable, you only need one
instance in your entire
application, irrespective of the number of cores. The data is accessed only
when the
ImmutableRoaringBitmap is actually queried, and what is accessed is the
original stream of
bytes (no unnecessary copy is made). So it uses less memory.

Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would not
be difficult,
programming-wise, but this would make the patch more difficult to review.

(I'll recopy some of my comments on JIRA.)


As usual, the copyright of this patch and be assigned to whoever... should
you choose
to use it. This patch or the Roaring library itself are *not* covered by
patents. And
so forth.



On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <[email protected]> wrote:

> Thanks for clarifying.
>
> Let me see what we can do on this front.
>
> On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <[email protected]> wrote:
>
>> Thanks Daniel, I think that's most efficient way to have Roaring
>> work in existing code, patch is really be appreciated :)
>>
>> It's great discussion in KYLIN-1034.
>>
>> Thanks.
>>
>>
>> Best Regards!
>> ---------------------
>>
>> Luke Han
>>
>> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <[email protected]> wrote:
>>
>>> Good day Luke,
>>>
>>> May I ask you a favor to bring it into our source code? We could work
>>>> together to make it work for
>>>> our current cases and then run some benchmark with real case.
>>>>
>>>
>>> We can rather easily substitute Roaring for Concise in the source code.
>>> Then submit a patch.
>>> Is that what would move this along most efficiently?
>>>
>>>
>>> Meanwhile, we have been flushing out some of the issues on JIRA :
>>>
>>> https://issues.apache.org/jira/browse/KYLIN-1034
>>>
>>> Some of these issues (e.g., memory-file mapping) might be of general
>>> interest.
>>>
>>> - Daniel
>>>
>>
>>
>

Reply via email to