Hi Daniel,
The patch looks very great, will ask Yang to help review and merge if
there's no issue.
Thank you very much for your contribution.
Best Regards!
---------------------
Luke Han
On Mon, Sep 28, 2015 at 11:12 PM, Daniel Lemire <[email protected]> wrote:
> Good day Luke,
>
> A patch has been added to the JIRA :
>
> https://issues.apache.org/jira/browse/KYLIN-1034
>
> I have also issued a PR on GitHub:
>
> https://github.com/apache/incubator-kylin/pull/12
>
> The patch is straight-forward, and simply replaces Concise by Roaring
> throughout.
>
> The relevant unit tests appear to pass.
>
> Further review, testing and benchmarking is encouraged. The purpose of
> this patch
> is to get the process started.
>
> To keep things simple, I did not do *any* redesign. Still... here are my
> thoughts...
>
> Design-wise : It does look to me like the bitmaps are serialized to
> streams of bytes.
> From there, *immutable* bitmaps are reloaded on demand, then possibly
> copied and modified.
> The Roaring library has a class ideally suited for this purpose, called
> ImmutableRoaringBitmap...
> From any ByteBuffer, you can map directly a bitmap :
>
> https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java
> Compared to deserializing a bitmap from a stream of bytes, this approach
> avoids copying
> and parsing the data: constructing an ImmutableRoaringBitmap is very fast
> and uses very
> little memory. Because they are formally immutable, you only need one
> instance in your entire
> application, irrespective of the number of cores. The data is accessed
> only when the
> ImmutableRoaringBitmap is actually queried, and what is accessed is the
> original stream of
> bytes (no unnecessary copy is made). So it uses less memory.
>
> Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would not
> be difficult,
> programming-wise, but this would make the patch more difficult to review.
>
> (I'll recopy some of my comments on JIRA.)
>
>
> As usual, the copyright of this patch and be assigned to whoever... should
> you choose
> to use it. This patch or the Roaring library itself are *not* covered by
> patents. And
> so forth.
>
>
>
> On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <[email protected]> wrote:
>
>> Thanks for clarifying.
>>
>> Let me see what we can do on this front.
>>
>> On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <[email protected]> wrote:
>>
>>> Thanks Daniel, I think that's most efficient way to have Roaring
>>> work in existing code, patch is really be appreciated :)
>>>
>>> It's great discussion in KYLIN-1034.
>>>
>>> Thanks.
>>>
>>>
>>> Best Regards!
>>> ---------------------
>>>
>>> Luke Han
>>>
>>> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <[email protected]> wrote:
>>>
>>>> Good day Luke,
>>>>
>>>> May I ask you a favor to bring it into our source code? We could work
>>>>> together to make it work for
>>>>> our current cases and then run some benchmark with real case.
>>>>>
>>>>
>>>> We can rather easily substitute Roaring for Concise in the source code.
>>>> Then submit a patch.
>>>> Is that what would move this along most efficiently?
>>>>
>>>>
>>>> Meanwhile, we have been flushing out some of the issues on JIRA :
>>>>
>>>> https://issues.apache.org/jira/browse/KYLIN-1034
>>>>
>>>> Some of these issues (e.g., memory-file mapping) might be of general
>>>> interest.
>>>>
>>>> - Daniel
>>>>
>>>
>>>
>>
>