Hello, I am doing a POC on kylin Cubes, I have built a Cube on TPC-DS data (~40GB). The build was successful, but i am facing issues with queries. Simple aggregation queries are returning results in sub seconds, but queries with order by/group by taking too much time. In first place, queries were failing with timeout error because of records scan threshold, i then increased "kylin.query.scan.threshold" value in kylin.properties. The threshold error got fixed, but queries were taking around 200 sec. Which is totally not acceptable because HIVE was returning result in 10 seconds for the same query. I am attaching one of the query(standard TPC-DS query q3) i am trying to run, SELECT date_dim.d_year,item.i_brand_id, item.i_brand,sum(facttable.ss_ext_discount_amt) sum_agg FROM store_sales facttableINNER JOIN date_dim date_dim ON (facttable.ss_sold_date_sk = date_dim.d_date_sk)INNER JOIN item item ON (facttable.ss_item_sk = item.i_item_sk) WHERE item.i_manufact_id = 783 and date_dim.d_moy = 11 GROUP BY date_dim.d_year, item.i_brand,item.i_brand_id ORDER BY date_dim.d_year,sum_agg DESC,item.i_brand_idLIMIT 100; My cluster details are,10 nodes(each node has 32 cores, 64GB RAM) with hdp 2.5HBase 1.1.2.2.5.3.0-37 (fully distributed mode)
Just to investigate, i checked region server logs of all the nodes and found that during query execution only one region server was doing all the work while others were idle. And, my Cube's Hbase table was also showing 1 region count, So i tried changing following properties but still no luck. kylin.hbase.hfile.size.gb=1kylin.hbase.region.count.min=8 Please let me know, if there is any other configuration needed in order to fix large query time. Thanks