Re: Group by + where clause

Sarnath Sat, 14 Nov 2015 08:45:06 -0800

Hi ShaoFeng Shi,

Thanks for the info... Yes, I meant the Cuboid when I referred Segment.. I
did not know Segment is a separate keyword in Kylin.
We ran a simple experiment on this and found that this is indeed the case.
We created a Product,Branch cuboid and ran queries projecting
Product,Branch and Aggregations while filtering on Product or a Branch....
The filter on product worked better compared to Branch... consistently...
The branch ran almost 1.6x slower than the filter on Product..... This was
on a small synthetic dataset - 10million entries....


Best,
Sarnath


On Sat, Nov 14, 2015 at 8:57 PM, ShaoFeng Shi <[email protected]>
wrote:

> Kylin doesn't need full segment scan. It only need scan one Cuboid (one
> combination of dimensions), which is a subset of a segment.
>
> If there is "where" condition in query, Kylin will try to narrow down the
> scan key range with the given values, but this depends on the sequence of
> the dimension rows on rowkey (I think you can understand it). This is why
> the sequence of rowkey is so important for query performance.
>
> Besides, "where" conditions will be sent to HBaser coprocessor to do server
> side filtering.
>
>
>
> 2015-11-13 18:36 GMT+08:00 Sarnath <[email protected]>:
>
> > Hi All,
> > Does kylin perform full segment scans on certain GROUP BY followed by
> WHERE
> > clause?
> > This, I think, is because of rowkey hbase design. Can some1 confirm my
> > understanding?
> > Best,
> > Sarnath
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>

Re: Group by + where clause

Reply via email to