Hi All,
Thanks for replying. The problem has been resolved, queries are running in sub
second now. Actually, i was using dimensions as "derived" instead of "normal".
Thanks
On Sunday, 19 March 2017 12:29 AM, Alberto Ramón
<[email protected]> wrote:
Hi
can you try to rebuild cube with a new measure? TopN
2017-03-17 17:58 GMT+00:00 Li Yang <[email protected]>:
> You didn't mention the Kylin version. Seems to be 1.6 from the
> configuration property.
>
> The properties related to region number are (note names are slightly
> differently in 1.6):
> kylin.storage.hbase.region-cut-gb=5
> kylin.storage.hbase.min-region-count=1
> kylin.storage.hbase.max-region-count=500
>
> As to the query, it is a simple OLAP query and should be lightening fast if
> you got the right cube and model. This talk on Apache Kylin 2.0 touches a
> bit about TPC-H on Kylin, which may give ideas.
>
> The rowkey order also impact as HBase does not have secondary index. You
> want "d_moy" and "i_manufact_id" be at (or near) the head of rowkey to get
> best performance of this query.
>
> If you still have problem, there are some online tuning tools for Kylin
> that you can try.
>
> Cheers
> Yang
>
>
> On Fri, Mar 10, 2017 at 1:42 AM, <[email protected]>
> wrote:
>
> > Hello,
> > I am doing a POC on kylin Cubes, I have built a Cube on TPC-DS data
> > (~40GB). The build was successful, but i am facing issues with queries.
> > Simple aggregation queries are returning results in sub seconds, but
> > queries with order by/group by taking too much time. In first place,
> > queries were failing with timeout error because of records scan
> threshold,
> > i then increased "kylin.query.scan.threshold" value in kylin.properties.
> > The threshold error got fixed, but queries were taking around 200 sec.
> > Which is totally not acceptable because HIVE was returning result in 10
> > seconds for the same query. I am attaching one of the query(standard
> TPC-DS
> > query q3) i am trying to run,
> > SELECT date_dim.d_year,item.i_brand_id, item.i_brand,sum(facttable.ss_
> ext_discount_amt)
> > sum_agg FROM store_sales facttableINNER JOIN date_dim date_dim ON
> > (facttable.ss_sold_date_sk = date_dim.d_date_sk)INNER JOIN item item ON
> > (facttable.ss_item_sk = item.i_item_sk) WHERE item.i_manufact_id =
> > 783 and date_dim.d_moy = 11 GROUP BY date_dim.d_year,
> item.i_brand,item.i_brand_id ORDER
> > BY date_dim.d_year,sum_agg DESC,item.i_brand_idLIMIT 100;
> > My cluster details are,10 nodes(each node has 32 cores, 64GB RAM) with
> hdp
> > 2.5HBase 1.1.2.2.5.3.0-37 (fully distributed mode)
> >
> > Just to investigate, i checked region server logs of all the nodes and
> > found that during query execution only one region server was doing all
> the
> > work while others were idle. And, my Cube's Hbase table was also showing
> 1
> > region count, So i tried changing following properties but still no luck.
> > kylin.hbase.hfile.size.gb=1kylin.hbase.region.count.min=8
> > Please let me know, if there is any other configuration needed in order
> to
> > fix large query time.
> > Thanks
> >
> >
>