Thanks Hua, from ur tests and detailed explanation, it should be safe to
change the value of the threshold directly and rebuild the codes. however,
I still have 2 questions about the threshold, from the codes, there are 2
places which set the threshold:
1, long rowEst = MEM_BUDGET_PER_QUERY / rowSizeEst;
context.setThreshold((int) rowEst);
2, String propThreshold =
connProps.getProperty(OLAPQuery.PROP_SCAN_THRESHOLD);
int threshold = Integer.valueOf(propThreshold);
olapContext.storageContext.setThreshold(threshold);
does anyone know what's the purposes for the 2 settings?
2015-05-18 11:40 GMT+08:00 Li Yang <[email protected]>:
> A query goes through GUI -> Query Engine -> HBase. You can analyze the
> response time of each step.
>
> - GUI, see browser console
> - Query Engine, see the "===[QUERY]===" log lines. In the kylinlog1.txt,
> it's 151 seconds
> - HBase, see the "HBase Metrics" log line. However I didn't see it in the
> log files. Wield.
>
> Concerning the SQL that pulls all 18 columns, transferring the big result
> alone may already eats up most of the time.
>
> Btw, Kylin is not designed for ETL purpose, we don't expect user to pull
> millions of rows, at least not for usual cases. Result set is typically a
> few thousands for interactive analysis.
>
> Cheers
> Yang
>
> On Mon, May 18, 2015 at 10:23 AM, dong wang <[email protected]>
> wrote:
>
> > Thanks hua, usually users don't need to fetch 4,000,000 + rows of the
> > result, but for the intermediate query result, the row number may be much
> > more than 4,000,000+ rows, in your above reply, u mentioned that we can
> > just change the value of the setting, then rebuild the codes and restart
> > the tomcat, is it what you have already tested? since currently there
> are
> > so much data in the existing cubes, I have to make it sure that all such
> > operations are safe to take~
> >
> > 2015-05-15 21:29 GMT+08:00 Adunuthula, Seshu <[email protected]>:
> >
> > > As a short term fix, does it make sense to make this a tunable
> parameter
> > > and move this to a config file?
> > >
> > > On 5/15/15, 5:58 AM, "Huang Hua" <[email protected]> wrote:
> > >
> > > >Hi Dong,
> > > >
> > > >I don't think so. You can safely change that setting but then you need
> > to
> > > >recompile kylin to generate the new war(don't use the deploy.sh
> because
> > > >that will wipe out all your kylin hbase meta storage). After the war
> is
> > > >generated, put that war under Tomcat webapps directory and restarts
> the
> > > >Tomcat. That should work well.
> > > >
> > > >Best.
> > > >Hua
> > > >> -----邮件原件-----
> > > >> 发件人: dev-return-1698-
> > > >> [email protected] [mailto:
> > dev-return-
> > > >> [email protected]] 代表 dong
> > > >> wang
> > > >> 发送时间: 2015年5月15日 18:54
> > > >> 收件人: [email protected]
> > > >> 主题: Re: Increase query performance
> > > >>
> > > >> I found the setting for the threshold locates in
> StorageContext.java,
> > > >>the
> > > >> related piece of codes are:
> > > >> public class StorageContext {
> > > >>
> > > >> public static final int HARD_THRESHOLD = 4000000;
> > > >>
> > > >>
> > > >> thus, I have a question that currently I have already built some
> > > >>segments
> > > >> successfully, later on, if I change the threshold much greater,
> will
> > > >>it affect the
> > > >> existing data in the cube storage?
> > > >>
> > > >> 2015-05-15 18:48 GMT+08:00 dong wang <[email protected]>:
> > > >>
> > > >> > Hi all, today I also met with the same problem, however, maybe
> mine
> > is
> > > >> > much more strange, the SQL lies in the following:
> > > >> > select count(* ) from (select 1 from test1 where condtionx group
> by
> > > >> > col1, col2, col3) t1
> > > >> >
> > > >> > since the result of the sub query is greater than 4000000, the
> > > >> > exception is thrown out~ however, the final row count of the the
> > whole
> > > >> > SQL is just 1 row, such kind of SQL is usually implemented to
> obtain
> > > >> > the total row count of some queries for paging feature~
> > > >> >
> > > >> > 2015-05-13 18:15 GMT+08:00 Parkavi Nandagopal <[email protected]
> >:
> > > >> >
> > > >> >> After getting that below error (Scan row count exceeded
> threshold:
> > > >> >> 4000000), kylin is stopped/crashed automatically.
> > > >> >> Is Kylin single point of Failure?
> > > >> >> How to make it has an High availability?
> > > >> >>
> > > >> >> Thanks,
> > > >> >> Parkavi.
> > > >> >>
> > > >> >>
> > > >> >> -----Original Message-----
> > > >> >> From: Parkavi Nandagopal
> > > >> >> Sent: Wednesday, May 13, 2015 10:49 AM
> > > >> >> To: dev; '[email protected]'
> > > >> >> Subject: RE: Increase query performance
> > > >> >>
> > > >> >> Size of my hive fact table = 3.27 GB ( row count 25,236,160) Cube
> > > >> >> size =
> > > >> >> 2.21 GB
> > > >> >>
> > > >> >> I created hierarchy dimension with 18 levels.
> > > >> >> Col1 -> Col2 -> ......upto Col18
> > > >> >> For this 18 levels, total cardinality = 2635
> > > >> >>
> > > >> >> I attached 2 log files.
> > > >> >> Log1 - query with limit 1000000
> > > >> >> Partial result came.
> > > >> >> Log2 - Clicked show all in Query result.
> > > >> >> Getting ERROR : exception while executing query: Scan row count
> > > >> >> exceeded
> > > >> >> threshold: 4000000, please add filter condition to narrow down
> > > >> >> backend scan range, like where clause.
> > > >> >>
> > > >> >> Thanks,
> > > >> >> Parkavi.
> > > >> >>
> > > >> >> -----Original Message-----
> > > >> >> From: hongbin ma [mailto:[email protected]]
> > > >> >> Sent: Wednesday, May 13, 2015 7:15 AM
> > > >> >> To: dev
> > > >> >> Subject: Re: Increase query performance
> > > >> >>
> > > >> >> before you expand your cluster, you might need to analyse why
> it's
> > > >> >> delivering poor performance.
> > > >> >>
> > > >> >> how about the size of your hive fact table? the cardinality of
> the
> > > >> >> dimension columns?
> > > >> >>
> > > >> >> if possible you can run a query,and paste the query's log in
> > > >> >> KYLIN_HOME/logs/kylin.log for that query. we can help you check
> for
> > > >> >> any abnormalities. (make sure you're writing a slightly different
> > > >> >> query, to avoid hitting cache)
> > > >> >>
> > > >> >> On Tue, May 12, 2015 at 2:04 PM, Parkavi Nandagopal
> > > >> >> <[email protected]>
> > > >> >> wrote:
> > > >> >>
> > > >> >> > Hi ,
> > > >> >> >
> > > >> >> > I have installed kylin and created cube(3GB size) with only one
> > > >> >> > region server and when I query the cube data, it is taking much
> > > >> >> > time to show the query result in Kylin web UI.
> > > >> >> > If I add 3 or more region server node with high configuration
> > and I
> > > >> >> > create a cube then query the cube means will it increase the
> > query
> > > >> >> performance?
> > > >> >> >
> > > >> >> >
> > > >> >> > Thanks,
> > > >> >> > Parkavi.
> > > >> >> >
> > > >> >> >
> > > >> >> > ::DISCLAIMER::
> > > >> >> >
> > > >> >> >
> > -------------------------------------------------------------------
> > > >> >> > ---
> > > >> >> >
> > -------------------------------------------------------------------
> > > >> >> > ---
> > > >> >> > --------
> > > >> >> >
> > > >> >> > The contents of this e-mail and any attachment(s) are
> > confidential
> > > >> >> > and intended for the named recipient(s) only.
> > > >> >> > E-mail transmission is not guaranteed to be secure or
> error-free
> > as
> > > >> >> > information could be intercepted, corrupted, lost, destroyed,
> > > >> >> > arrive late or incomplete, or may contain viruses in
> > transmission.
> > > >> >> > The e mail and its contents (with or without referred errors)
> > shall
> > > >> >> > therefore not attach any liability on the originator or HCL or
> > its
> > > >>affiliates.
> > > >> >> > Views or opinions, if any, presented in this email are solely
> > those
> > > >> >> > of the author and may not necessarily reflect the views or
> > opinions
> > > >> >> > of HCL or its affiliates. Any form of reproduction,
> > dissemination,
> > > >> >> > copying, disclosure, modification, distribution and / or
> > > >> >> > publication of this message without the prior written consent
> of
> > > >> >> > authorized representative of HCL is strictly prohibited. If you
> > > >> >> > have received this email in error please delete it and notify
> the
> > > >> >> > sender immediately.
> > > >> >> > Before opening any email and/or attachments, please check them
> > for
> > > >> >> > viruses and other defects.
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > -------------------------------------------------------------------
> > > >> >> > ---
> > > >> >> >
> > -------------------------------------------------------------------
> > > >> >> > ---
> > > >> >> > --------
> > > >> >> >
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >> Regards,
> > > >> >>
> > > >> >> *Bin Mahone | 马洪宾*
> > > >> >> Apache Kylin: http://kylin.io
> > > >> >> Github: https://github.com/binmahone
> > > >> >>
> > > >> >
> > > >> >
> > > >
> > > >
> > >
> > >
> >
>