Will definitely call for help when it's needed. Many thanks again! On Tue, Oct 6, 2015 at 9:09 PM, Daniel Lemire <[email protected]> wrote:
> I will be happy to collaborate. There are other people from the Roaring > team who are available > to help too. > > On Tue, Oct 6, 2015 at 7:23 AM, Li Yang <[email protected]> wrote: > >> Patch merged into 1.x branch. >> https://github.com/apache/incubator-kylin/commit/f00c838e6117933e725c3e69f0f30a908541b8a8 >> >> There shall be major refactoring around inverted-index and realtime OLAP >> in Q4 or early 2016. Will embrace more from Roaring bitmaps. >> >> Many thanks Daniel! >> >> >> On Tue, Sep 29, 2015 at 11:45 PM, Luke Han <[email protected]> wrote: >> >>> Hi Daniel, >>> The patch looks very great, will ask Yang to help review and merge if >>> there's no issue. >>> >>> Thank you very much for your contribution. >>> >>> >>> >>> >>> Best Regards! >>> --------------------- >>> >>> Luke Han >>> >>> On Mon, Sep 28, 2015 at 11:12 PM, Daniel Lemire <[email protected]> >>> wrote: >>> >>> > Good day Luke, >>> > >>> > A patch has been added to the JIRA : >>> > >>> > https://issues.apache.org/jira/browse/KYLIN-1034 >>> > >>> > I have also issued a PR on GitHub: >>> > >>> > https://github.com/apache/incubator-kylin/pull/12 >>> > >>> > The patch is straight-forward, and simply replaces Concise by Roaring >>> > throughout. >>> > >>> > The relevant unit tests appear to pass. >>> > >>> > Further review, testing and benchmarking is encouraged. The purpose of >>> > this patch >>> > is to get the process started. >>> > >>> > To keep things simple, I did not do *any* redesign. Still... here are >>> my >>> > thoughts... >>> > >>> > Design-wise : It does look to me like the bitmaps are serialized to >>> > streams of bytes. >>> > From there, *immutable* bitmaps are reloaded on demand, then possibly >>> > copied and modified. >>> > The Roaring library has a class ideally suited for this purpose, called >>> > ImmutableRoaringBitmap... >>> > From any ByteBuffer, you can map directly a bitmap : >>> > >>> > >>> https://github.com/lemire/RoaringBitmap/blob/master/examples/ImmutableRoaringBitmapExample.java >>> > Compared to deserializing a bitmap from a stream of bytes, this >>> approach >>> > avoids copying >>> > and parsing the data: constructing an ImmutableRoaringBitmap is very >>> fast >>> > and uses very >>> > little memory. Because they are formally immutable, you only need one >>> > instance in your entire >>> > application, irrespective of the number of cores. The data is accessed >>> > only when the >>> > ImmutableRoaringBitmap is actually queried, and what is accessed is the >>> > original stream of >>> > bytes (no unnecessary copy is made). So it uses less memory. >>> > >>> > Making us of ImmutableRoaringBitmap and mapped bitmaps in kylin would >>> not >>> > be difficult, >>> > programming-wise, but this would make the patch more difficult to >>> review. >>> > >>> > (I'll recopy some of my comments on JIRA.) >>> > >>> > >>> > As usual, the copyright of this patch and be assigned to whoever... >>> should >>> > you choose >>> > to use it. This patch or the Roaring library itself are *not* covered >>> by >>> > patents. And >>> > so forth. >>> > >>> > >>> > >>> > On Sun, Sep 27, 2015 at 2:03 PM, Daniel Lemire <[email protected]> >>> wrote: >>> > >>> >> Thanks for clarifying. >>> >> >>> >> Let me see what we can do on this front. >>> >> >>> >> On Sat, Sep 26, 2015 at 7:16 PM, Luke Han <[email protected]> wrote: >>> >> >>> >>> Thanks Daniel, I think that's most efficient way to have Roaring >>> >>> work in existing code, patch is really be appreciated :) >>> >>> >>> >>> It's great discussion in KYLIN-1034. >>> >>> >>> >>> Thanks. >>> >>> >>> >>> >>> >>> Best Regards! >>> >>> --------------------- >>> >>> >>> >>> Luke Han >>> >>> >>> >>> On Sat, Sep 26, 2015 at 9:59 PM, Daniel Lemire <[email protected]> >>> wrote: >>> >>> >>> >>>> Good day Luke, >>> >>>> >>> >>>> May I ask you a favor to bring it into our source code? We could >>> work >>> >>>>> together to make it work for >>> >>>>> our current cases and then run some benchmark with real case. >>> >>>>> >>> >>>> >>> >>>> We can rather easily substitute Roaring for Concise in the source >>> code. >>> >>>> Then submit a patch. >>> >>>> Is that what would move this along most efficiently? >>> >>>> >>> >>>> >>> >>>> Meanwhile, we have been flushing out some of the issues on JIRA : >>> >>>> >>> >>>> https://issues.apache.org/jira/browse/KYLIN-1034 >>> >>>> >>> >>>> Some of these issues (e.g., memory-file mapping) might be of general >>> >>>> interest. >>> >>>> >>> >>>> - Daniel >>> >>>> >>> >>> >>> >>> >>> >> >>> > >>> >> >> >
