Hi ShaoFeng Shi, Thanks for the info... Yes, I meant the Cuboid when I referred Segment.. I did not know Segment is a separate keyword in Kylin. We ran a simple experiment on this and found that this is indeed the case. We created a Product,Branch cuboid and ran queries projecting Product,Branch and Aggregations while filtering on Product or a Branch.... The filter on product worked better compared to Branch... consistently... The branch ran almost 1.6x slower than the filter on Product..... This was on a small synthetic dataset - 10million entries....
Best, Sarnath On Sat, Nov 14, 2015 at 8:57 PM, ShaoFeng Shi <[email protected]> wrote: > Kylin doesn't need full segment scan. It only need scan one Cuboid (one > combination of dimensions), which is a subset of a segment. > > If there is "where" condition in query, Kylin will try to narrow down the > scan key range with the given values, but this depends on the sequence of > the dimension rows on rowkey (I think you can understand it). This is why > the sequence of rowkey is so important for query performance. > > Besides, "where" conditions will be sent to HBaser coprocessor to do server > side filtering. > > > > 2015-11-13 18:36 GMT+08:00 Sarnath <[email protected]>: > > > Hi All, > > Does kylin perform full segment scans on certain GROUP BY followed by > WHERE > > clause? > > This, I think, is because of rowkey hbase design. Can some1 confirm my > > understanding? > > Best, > > Sarnath > > > > > > -- > Best regards, > > Shaofeng Shi >
