Re: a few slides for Strata + Hadoop World London 2015

Li Yang Tue, 05 May 2015 13:24:40 -0700

Hi Xu,

Let me answers the grid table questions. The slides use inverted index as
an example but didn't explain how cuboid is modeled by grid table.

1. Grid table assumes records are ordered, by some key. The key can be
timestamp but does not have to be. When modeling a cuboid, all dimensions
on cuboid becomes a composite order key.

2. Row block is the unit for row level filtering, with inverted index,
brings random access to storage. However row block can also be disabled if
a storage is designed for sequential scan. Cuboid is this case.

3. The size of row block is customizable, and it's like a slider let you
adjust where to stand between random access and sequential read. The
smaller the row block size, the better term filtering, the more random
access. The larger the row block size, the less term filtering, the more
sequential read. For storage like hbase which is better at sequential read,
we tend to use big block size, and the resulted rowkey will be very similar
to your suggestion ("coarse timestamp + term + fine timestamp"). A row
block as big as a whole day data is a good idea. We can benchmark hbase to
decide a more precise value.

4. About TopN, currently II and cube both can do TopN no problem. However
there's a lot room for improvement. For II, row skipping is not implemented
yet, depends on grid table ready. For cube, there's no plan to make TopN a
metrics yet, even if we did that one day, the metrics will be huge like
HyperLogLog, think about the ascending/descending, by what order, limit how
much etc, the combinations are very big. The sort and limit push down is
still an open JIRA.

We don't have strong demand of super fast TopN at the moment. Speedup the
cube build, enable streaming are highest priority.

Cheers
Yang

On Mon, May 4, 2015 at 6:36 AM, Luke Han <[email protected]> wrote:

> Great suggestion, thanks Ted.
>
> Regards!
> Luke Han
>
>     _____________________________
> From: Ted Dunning <[email protected]>
> Sent: 星期一, 五月 4, 2015 01:07
> Subject: Re: a few slides for Strata + Hadoop World London 2015
> To:  <[email protected]>
>
>
> One important question that came up after I talked about Kylin at Hadoop
> Summit was regarding security.  Having a few slides on that may be good in
> any presentation.
>
>
>
> On Sun, May 3, 2015 at 4:16 PM, Luke Han <[email protected]> wrote:
>
> > Thanks Yang, really great and detail about ongoing lambda architecture.
> >
> > Will merge to our deck, let's discuss about the detail on the plane
> > tomorrow:-)
> >
> > Thanks.
> >
> >
> > Best Regards!
> > ---------------------
> >
> > Luke Han
> >
> > 2015-05-02 8:20 GMT+08:00 Li Yang <[email protected]>:
> >
> > > Hi Luke
> > >
> > > I created a few slides for Strata + Hadoop World London 2015 next week,
> > > see attached. Let's see how they merge with previous deck.
> > >
> > > Some should attach to related JIRA as design doc. I'll do it later.
> > >
> > > Cheers
> > > Yang
> > >
> >
>

Re: a few slides for Strata + Hadoop World London 2015

Reply via email to