Hi all,
Please check out the final deck which has been presented on
Strata+Hadoop World London:
http://www.slideshare.net/lukehan/apache-kylin-extreme-olap-engine-for-big-data
Thanks.
Luke
Best Regards!
---------------------
Luke Han
2015-05-05 21:23 GMT+01:00 Li Yang <[email protected]>:
> Hi Xu,
>
> Let me answers the grid table questions. The slides use inverted index as
> an example but didn't explain how cuboid is modeled by grid table.
>
> 1. Grid table assumes records are ordered, by some key. The key can be
> timestamp but does not have to be. When modeling a cuboid, all dimensions
> on cuboid becomes a composite order key.
>
> 2. Row block is the unit for row level filtering, with inverted index,
> brings random access to storage. However row block can also be disabled if
> a storage is designed for sequential scan. Cuboid is this case.
>
> 3. The size of row block is customizable, and it's like a slider let you
> adjust where to stand between random access and sequential read. The
> smaller the row block size, the better term filtering, the more random
> access. The larger the row block size, the less term filtering, the more
> sequential read. For storage like hbase which is better at sequential read,
> we tend to use big block size, and the resulted rowkey will be very similar
> to your suggestion ("coarse timestamp + term + fine timestamp"). A row
> block as big as a whole day data is a good idea. We can benchmark hbase to
> decide a more precise value.
>
> 4. About TopN, currently II and cube both can do TopN no problem. However
> there's a lot room for improvement. For II, row skipping is not implemented
> yet, depends on grid table ready. For cube, there's no plan to make TopN a
> metrics yet, even if we did that one day, the metrics will be huge like
> HyperLogLog, think about the ascending/descending, by what order, limit how
> much etc, the combinations are very big. The sort and limit push down is
> still an open JIRA.
>
> We don't have strong demand of super fast TopN at the moment. Speedup the
> cube build, enable streaming are highest priority.
>
> Cheers
> Yang
>
>
> On Mon, May 4, 2015 at 6:36 AM, Luke Han <[email protected]> wrote:
>
> > Great suggestion, thanks Ted.
> >
> > Regards!
> > Luke Han
> >
> > _____________________________
> > From: Ted Dunning <[email protected]>
> > Sent: 星期一, 五月 4, 2015 01:07
> > Subject: Re: a few slides for Strata + Hadoop World London 2015
> > To: <[email protected]>
> >
> >
> > One important question that came up after I talked about Kylin at Hadoop
> > Summit was regarding security. Having a few slides on that may be good
> in
> > any presentation.
> >
> >
> >
> > On Sun, May 3, 2015 at 4:16 PM, Luke Han <[email protected]> wrote:
> >
> > > Thanks Yang, really great and detail about ongoing lambda architecture.
> > >
> > > Will merge to our deck, let's discuss about the detail on the plane
> > > tomorrow:-)
> > >
> > > Thanks.
> > >
> > >
> > > Best Regards!
> > > ---------------------
> > >
> > > Luke Han
> > >
> > > 2015-05-02 8:20 GMT+08:00 Li Yang <[email protected]>:
> > >
> > > > Hi Luke
> > > >
> > > > I created a few slides for Strata + Hadoop World London 2015 next
> week,
> > > > see attached. Let's see how they merge with previous deck.
> > > >
> > > > Some should attach to related JIRA as design doc. I'll do it later.
> > > >
> > > > Cheers
> > > > Yang
> > > >
> > >
> >
>