Good day Luke, A patch has been added to the JIRA :
https://issues.apache.org/jira/browse/KYLIN-1034 I have also issued a PR on GitHub: https://github.com/apache/incubator-kylin/pull/12 The patch is straight-forward, and simply replaces Concise by Roaring throughout. The relevant unit tests appear to pass. Further review, testing and benchmarking is encouraged. The purpose of this patch is to get the process started. To keep things simple, I did not do *any* redesign. Still... here are my thoughts... Design-wise : It does look to me like the bitmaps are serialized to streams of bytes. >From there, *immutable* bitmaps are reloaded on demand, then possibly copied and modified. The Roaring library has a class ideally suited for this purpose, called ImmutableRoaringBitmap... >From any ByteBuffer, you can map directly a bitmap : https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java Compared to deserializing a bitmap from a stream of bytes, this approach avoids copying and parsing the data: constructing an ImmutableRoaringBitmap is very fast and uses very little memory. Because they are formally immutable, you only need one instance in your entire application, irrespective of the number of cores. The data is accessed only when the ImmutableRoaringBitmap is actually queried, and what is accessed is the original stream of bytes (no unnecessary copy is made). So it uses less memory. Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would not be difficult, programming-wise, but this would make the patch more difficult to review. (I'll recopy some of my comments on JIRA.) As usual, the copyright of this patch and be assigned to whoever... should you choose to use it. This patch or the Roaring library itself are *not* covered by patents. And so forth. On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <[email protected]> wrote: > Thanks for clarifying. > > Let me see what we can do on this front. > > On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <[email protected]> wrote: > >> Thanks Daniel, I think that's most efficient way to have Roaring >> work in existing code, patch is really be appreciated :) >> >> It's great discussion in KYLIN-1034. >> >> Thanks. >> >> >> Best Regards! >> --------------------- >> >> Luke Han >> >> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <[email protected]> wrote: >> >>> Good day Luke, >>> >>> May I ask you a favor to bring it into our source code? We could work >>>> together to make it work for >>>> our current cases and then run some benchmark with real case. >>>> >>> >>> We can rather easily substitute Roaring for Concise in the source code. >>> Then submit a patch. >>> Is that what would move this along most efficiently? >>> >>> >>> Meanwhile, we have been flushing out some of the issues on JIRA : >>> >>> https://issues.apache.org/jira/browse/KYLIN-1034 >>> >>> Some of these issues (e.g., memory-file mapping) might be of general >>> interest. >>> >>> - Daniel >>> >> >> >
