Hi,
We have an internally developed cube engine. We use elastic search as
storage. For the experiment below, ES was almost 5x to 8x faster than
kylin. (60ms vs 240ms/450ms)... But then we use ES search REST interface
directly and compare that with kylin's REST interface. So I am not sure
about the SQL translation...overheads in kylin. And ES did not have the
row-key order based fluctuations... The developer also reported pretty less
storage... But I would cross check that before I can tell anything.
Let me cross check everything next week and see if I can publish a small
report. We only have a modest infra.. So I don't know what would be the
behavior at scale....
On Nov 15, 2015 12:30 PM, "ShaoFeng Shi" <[email protected]> wrote:

> Yes it is expected, and I think this is a balance between space
> and performance; usually we put the more-frequent filtered column before
> the low-frequent column on row key, that's just this purpose.
>
> I'm not sure whether other K-V storage can provide more power on this; Now
> Kylin has refactored to a plug-in architecture, which makes it possible to
> use other storage for cube; if you have any idea or suggestion please share
> with us.
>
> 2015-11-15 0:44 GMT+08:00 Sarnath <[email protected]>:
>
> > Hi ShaoFeng Shi,
> >
> > Thanks for the info... Yes, I meant the Cuboid when I referred Segment..
> I
> > did not know Segment is a separate keyword in Kylin.
> > We ran a simple experiment on this and found that this is indeed the
> case.
> > We created a Product,Branch cuboid and ran queries projecting
> > Product,Branch and Aggregations while filtering on Product or a
> Branch....
> > The filter on product worked better compared to Branch... consistently...
> > The branch ran almost 1.6x slower than the filter on Product..... This
> was
> > on a small synthetic dataset - 10million entries....
> >
> > Best,
> > Sarnath
> >
> >
> > On Sat, Nov 14, 2015 at 8:57 PM, ShaoFeng Shi <[email protected]>
> > wrote:
> >
> > > Kylin doesn't need full segment scan. It only need scan one Cuboid (one
> > > combination of dimensions), which is a subset of a segment.
> > >
> > > If there is "where" condition in query, Kylin will try to narrow down
> the
> > > scan key range with the given values, but this depends on the sequence
> of
> > > the dimension rows on rowkey (I think you can understand it). This is
> why
> > > the sequence of rowkey is so important for query performance.
> > >
> > > Besides, "where" conditions will be sent to HBaser coprocessor to do
> > server
> > > side filtering.
> > >
> > >
> > >
> > > 2015-11-13 18:36 GMT+08:00 Sarnath <[email protected]>:
> > >
> > > > Hi All,
> > > > Does kylin perform full segment scans on certain GROUP BY followed by
> > > WHERE
> > > > clause?
> > > > This, I think, is because of rowkey hbase design. Can some1 confirm
> my
> > > > understanding?
> > > > Best,
> > > > Sarnath
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > > Shaofeng Shi
> > >
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>

Reply via email to