I will be happy to collaborate. There are other people from the Roaring team who are available to help too.
On Tue, Oct 6, 2015 at 7:23 AM, Li Yang <[email protected]> wrote: > Patch merged into 1.x branch. > https://github.com/apache/incubator-kylin/commit/f00c838e6117933e725c3e69f0f30a908541b8a8 > > There shall be major refactoring around inverted-index and realtime OLAP > in Q4 or early 2016. Will embrace more from Roaring bitmaps. > > Many thanks Daniel! > > > On Tue, Sep 29, 2015 at 11:45 PM, Luke Han <[email protected]> wrote: > >> Hi Daniel, >> The patch looks very great, will ask Yang to help review and merge if >> there's no issue. >> >> Thank you very much for your contribution. >> >> >> >> >> Best Regards! >> --------------------- >> >> Luke Han >> >> On Mon, Sep 28, 2015 at 11:12 PM, Daniel Lemire <[email protected]> wrote: >> >> > Good day Luke, >> > >> > A patch has been added to the JIRA : >> > >> > https://issues.apache.org/jira/browse/KYLIN-1034 >> > >> > I have also issued a PR on GitHub: >> > >> > https://github.com/apache/incubator-kylin/pull/12 >> > >> > The patch is straight-forward, and simply replaces Concise by Roaring >> > throughout. >> > >> > The relevant unit tests appear to pass. >> > >> > Further review, testing and benchmarking is encouraged. The purpose of >> > this patch >> > is to get the process started. >> > >> > To keep things simple, I did not do *any* redesign. Still... here are my >> > thoughts... >> > >> > Design-wise : It does look to me like the bitmaps are serialized to >> > streams of bytes. >> > From there, *immutable* bitmaps are reloaded on demand, then possibly >> > copied and modified. >> > The Roaring library has a class ideally suited for this purpose, called >> > ImmutableRoaringBitmap... >> > From any ByteBuffer, you can map directly a bitmap : >> > >> > >> https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java >> > Compared to deserializing a bitmap from a stream of bytes, this approach >> > avoids copying >> > and parsing the data: constructing an ImmutableRoaringBitmap is very >> fast >> > and uses very >> > little memory. Because they are formally immutable, you only need one >> > instance in your entire >> > application, irrespective of the number of cores. The data is accessed >> > only when the >> > ImmutableRoaringBitmap is actually queried, and what is accessed is the >> > original stream of >> > bytes (no unnecessary copy is made). So it uses less memory. >> > >> > Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would >> not >> > be difficult, >> > programming-wise, but this would make the patch more difficult to >> review. >> > >> > (I'll recopy some of my comments on JIRA.) >> > >> > >> > As usual, the copyright of this patch and be assigned to whoever... >> should >> > you choose >> > to use it. This patch or the Roaring library itself are *not* covered by >> > patents. And >> > so forth. >> > >> > >> > >> > On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <[email protected]> >> wrote: >> > >> >> Thanks for clarifying. >> >> >> >> Let me see what we can do on this front. >> >> >> >> On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <[email protected]> wrote: >> >> >> >>> Thanks Daniel, I think that's most efficient way to have Roaring >> >>> work in existing code, patch is really be appreciated :) >> >>> >> >>> It's great discussion in KYLIN-1034. >> >>> >> >>> Thanks. >> >>> >> >>> >> >>> Best Regards! >> >>> --------------------- >> >>> >> >>> Luke Han >> >>> >> >>> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <[email protected]> >> wrote: >> >>> >> >>>> Good day Luke, >> >>>> >> >>>> May I ask you a favor to bring it into our source code? We could work >> >>>>> together to make it work for >> >>>>> our current cases and then run some benchmark with real case. >> >>>>> >> >>>> >> >>>> We can rather easily substitute Roaring for Concise in the source >> code. >> >>>> Then submit a patch. >> >>>> Is that what would move this along most efficiently? >> >>>> >> >>>> >> >>>> Meanwhile, we have been flushing out some of the issues on JIRA : >> >>>> >> >>>> https://issues.apache.org/jira/browse/KYLIN-1034 >> >>>> >> >>>> Some of these issues (e.g., memory-file mapping) might be of general >> >>>> interest. >> >>>> >> >>>> - Daniel >> >>>> >> >>> >> >>> >> >> >> > >> > >
