Patch merged into 1.x branch. https://github.com/apache/incubator-kylin/commit/f00c838e6117933e725c3e69f0f30a908541b8a8
There shall be major refactoring around inverted-index and realtime OLAP in Q4 or early 2016. Will embrace more from Roaring bitmaps. Many thanks Daniel! On Tue, Sep 29, 2015 at 11:45 PM, Luke Han <[email protected]> wrote: > Hi Daniel, > The patch looks very great, will ask Yang to help review and merge if > there's no issue. > > Thank you very much for your contribution. > > > > > Best Regards! > --------------------- > > Luke Han > > On Mon, Sep 28, 2015 at 11:12 PM, Daniel Lemire <[email protected]> wrote: > > > Good day Luke, > > > > A patch has been added to the JIRA : > > > > https://issues.apache.org/jira/browse/KYLIN-1034 > > > > I have also issued a PR on GitHub: > > > > https://github.com/apache/incubator-kylin/pull/12 > > > > The patch is straight-forward, and simply replaces Concise by Roaring > > throughout. > > > > The relevant unit tests appear to pass. > > > > Further review, testing and benchmarking is encouraged. The purpose of > > this patch > > is to get the process started. > > > > To keep things simple, I did not do *any* redesign. Still... here are my > > thoughts... > > > > Design-wise : It does look to me like the bitmaps are serialized to > > streams of bytes. > > From there, *immutable* bitmaps are reloaded on demand, then possibly > > copied and modified. > > The Roaring library has a class ideally suited for this purpose, called > > ImmutableRoaringBitmap... > > From any ByteBuffer, you can map directly a bitmap : > > > > > https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java > > Compared to deserializing a bitmap from a stream of bytes, this approach > > avoids copying > > and parsing the data: constructing an ImmutableRoaringBitmap is very fast > > and uses very > > little memory. Because they are formally immutable, you only need one > > instance in your entire > > application, irrespective of the number of cores. The data is accessed > > only when the > > ImmutableRoaringBitmap is actually queried, and what is accessed is the > > original stream of > > bytes (no unnecessary copy is made). So it uses less memory. > > > > Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would not > > be difficult, > > programming-wise, but this would make the patch more difficult to review. > > > > (I'll recopy some of my comments on JIRA.) > > > > > > As usual, the copyright of this patch and be assigned to whoever... > should > > you choose > > to use it. This patch or the Roaring library itself are *not* covered by > > patents. And > > so forth. > > > > > > > > On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <[email protected]> wrote: > > > >> Thanks for clarifying. > >> > >> Let me see what we can do on this front. > >> > >> On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <[email protected]> wrote: > >> > >>> Thanks Daniel, I think that's most efficient way to have Roaring > >>> work in existing code, patch is really be appreciated :) > >>> > >>> It's great discussion in KYLIN-1034. > >>> > >>> Thanks. > >>> > >>> > >>> Best Regards! > >>> --------------------- > >>> > >>> Luke Han > >>> > >>> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <[email protected]> > wrote: > >>> > >>>> Good day Luke, > >>>> > >>>> May I ask you a favor to bring it into our source code? We could work > >>>>> together to make it work for > >>>>> our current cases and then run some benchmark with real case. > >>>>> > >>>> > >>>> We can rather easily substitute Roaring for Concise in the source > code. > >>>> Then submit a patch. > >>>> Is that what would move this along most efficiently? > >>>> > >>>> > >>>> Meanwhile, we have been flushing out some of the issues on JIRA : > >>>> > >>>> https://issues.apache.org/jira/browse/KYLIN-1034 > >>>> > >>>> Some of these issues (e.g., memory-file mapping) might be of general > >>>> interest. > >>>> > >>>> - Daniel > >>>> > >>> > >>> > >> > > >
