Hi, We have an internally developed cube engine. We use elastic search as storage. For the experiment below, ES was almost 5x to 8x faster than kylin. (60ms vs 240ms/450ms)... But then we use ES search REST interface directly and compare that with kylin's REST interface. So I am not sure about the SQL translation...overheads in kylin. And ES did not have the row-key order based fluctuations... The developer also reported pretty less storage... But I would cross check that before I can tell anything. Let me cross check everything next week and see if I can publish a small report. We only have a modest infra.. So I don't know what would be the behavior at scale.... On Nov 15, 2015 12:30 PM, "ShaoFeng Shi" <[email protected]> wrote:
> Yes it is expected, and I think this is a balance between space > and performance; usually we put the more-frequent filtered column before > the low-frequent column on row key, that's just this purpose. > > I'm not sure whether other K-V storage can provide more power on this; Now > Kylin has refactored to a plug-in architecture, which makes it possible to > use other storage for cube; if you have any idea or suggestion please share > with us. > > 2015-11-15 0:44 GMT+08:00 Sarnath <[email protected]>: > > > Hi ShaoFeng Shi, > > > > Thanks for the info... Yes, I meant the Cuboid when I referred Segment.. > I > > did not know Segment is a separate keyword in Kylin. > > We ran a simple experiment on this and found that this is indeed the > case. > > We created a Product,Branch cuboid and ran queries projecting > > Product,Branch and Aggregations while filtering on Product or a > Branch.... > > The filter on product worked better compared to Branch... consistently... > > The branch ran almost 1.6x slower than the filter on Product..... This > was > > on a small synthetic dataset - 10million entries.... > > > > Best, > > Sarnath > > > > > > On Sat, Nov 14, 2015 at 8:57 PM, ShaoFeng Shi <[email protected]> > > wrote: > > > > > Kylin doesn't need full segment scan. It only need scan one Cuboid (one > > > combination of dimensions), which is a subset of a segment. > > > > > > If there is "where" condition in query, Kylin will try to narrow down > the > > > scan key range with the given values, but this depends on the sequence > of > > > the dimension rows on rowkey (I think you can understand it). This is > why > > > the sequence of rowkey is so important for query performance. > > > > > > Besides, "where" conditions will be sent to HBaser coprocessor to do > > server > > > side filtering. > > > > > > > > > > > > 2015-11-13 18:36 GMT+08:00 Sarnath <[email protected]>: > > > > > > > Hi All, > > > > Does kylin perform full segment scans on certain GROUP BY followed by > > > WHERE > > > > clause? > > > > This, I think, is because of rowkey hbase design. Can some1 confirm > my > > > > understanding? > > > > Best, > > > > Sarnath > > > > > > > > > > > > > > > > -- > > > Best regards, > > > > > > Shaofeng Shi > > > > > > > > > -- > Best regards, > > Shaofeng Shi >
