Hi can you try to rebuild cube with a new measure? TopN
2017-03-17 17:58 GMT+00:00 Li Yang <[email protected]>: > You didn't mention the Kylin version. Seems to be 1.6 from the > configuration property. > > The properties related to region number are (note names are slightly > differently in 1.6): > kylin.storage.hbase.region-cut-gb=5 > kylin.storage.hbase.min-region-count=1 > kylin.storage.hbase.max-region-count=500 > > As to the query, it is a simple OLAP query and should be lightening fast if > you got the right cube and model. This talk on Apache Kylin 2.0 touches a > bit about TPC-H on Kylin, which may give ideas. > > The rowkey order also impact as HBase does not have secondary index. You > want "d_moy" and "i_manufact_id" be at (or near) the head of rowkey to get > best performance of this query. > > If you still have problem, there are some online tuning tools for Kylin > that you can try. > > Cheers > Yang > > > On Fri, Mar 10, 2017 at 1:42 AM, <[email protected]> > wrote: > > > Hello, > > I am doing a POC on kylin Cubes, I have built a Cube on TPC-DS data > > (~40GB). The build was successful, but i am facing issues with queries. > > Simple aggregation queries are returning results in sub seconds, but > > queries with order by/group by taking too much time. In first place, > > queries were failing with timeout error because of records scan > threshold, > > i then increased "kylin.query.scan.threshold" value in kylin.properties. > > The threshold error got fixed, but queries were taking around 200 sec. > > Which is totally not acceptable because HIVE was returning result in 10 > > seconds for the same query. I am attaching one of the query(standard > TPC-DS > > query q3) i am trying to run, > > SELECT date_dim.d_year,item.i_brand_id, item.i_brand,sum(facttable.ss_ > ext_discount_amt) > > sum_agg FROM store_sales facttableINNER JOIN date_dim date_dim ON > > (facttable.ss_sold_date_sk = date_dim.d_date_sk)INNER JOIN item item ON > > (facttable.ss_item_sk = item.i_item_sk) WHERE item.i_manufact_id = > > 783 and date_dim.d_moy = 11 GROUP BY date_dim.d_year, > item.i_brand,item.i_brand_id ORDER > > BY date_dim.d_year,sum_agg DESC,item.i_brand_idLIMIT 100; > > My cluster details are,10 nodes(each node has 32 cores, 64GB RAM) with > hdp > > 2.5HBase 1.1.2.2.5.3.0-37 (fully distributed mode) > > > > Just to investigate, i checked region server logs of all the nodes and > > found that during query execution only one region server was doing all > the > > work while others were idle. And, my Cube's Hbase table was also showing > 1 > > region count, So i tried changing following properties but still no luck. > > kylin.hbase.hfile.size.gb=1kylin.hbase.region.count.min=8 > > Please let me know, if there is any other configuration needed in order > to > > fix large query time. > > Thanks > > > > >
