Re: 答复: Increase query performance

dong wang Tue, 19 May 2015 03:45:08 -0700

In addition, what's the difference between kylin.query.scan.threshold
inside kylin.properties and HARD_THRESHOLD inside class StorageContext?


2015-05-19 16:29 GMT+08:00 dong wang <[email protected]>:

> Thanks Hua, from ur tests and detailed explanation, it should be safe to
> change the value of the threshold directly and rebuild the codes. however,
> I still have 2 questions about the threshold, from the codes, there are 2
> places which set the threshold:
> 1,         long rowEst = MEM_BUDGET_PER_QUERY / rowSizeEst;
>         context.setThreshold((int) rowEst);
>
> 2,         String propThreshold =
> connProps.getProperty(OLAPQuery.PROP_SCAN_THRESHOLD);
>         int threshold = Integer.valueOf(propThreshold);
>         olapContext.storageContext.setThreshold(threshold);
>
>
> does anyone know what's the purposes for the 2 settings?
>
> 2015-05-18 11:40 GMT+08:00 Li Yang <[email protected]>:
>
>> A query goes through GUI -> Query Engine -> HBase. You can analyze the
>> response time of each step.
>>
>> - GUI, see browser console
>> - Query Engine, see the "===[QUERY]===" log lines. In the kylinlog1.txt,
>> it's 151 seconds
>> - HBase, see the "HBase Metrics" log line. However I didn't see it in the
>> log files. Wield.
>>
>> Concerning the SQL that pulls all 18 columns, transferring the big result
>> alone may already eats up most of the time.
>>
>> Btw, Kylin is not designed for ETL purpose, we don't expect user to pull
>> millions of rows, at least not for usual cases. Result set is typically a
>> few thousands for interactive analysis.
>>
>> Cheers
>> Yang
>>
>> On Mon, May 18, 2015 at 10:23 AM, dong wang <[email protected]>
>> wrote:
>>
>> > Thanks hua, usually users don't need to fetch 4,000,000 + rows of the
>> > result, but for the intermediate query result, the row number may be
>> much
>> > more than 4,000,000+ rows,  in your above reply, u mentioned that we can
>> > just change the value of the setting, then rebuild the codes and restart
>> > the tomcat,  is it what you have already tested?  since currently there
>> are
>> > so much data in the existing cubes, I have to make it sure that all such
>> > operations are safe to take~
>> >
>> > 2015-05-15 21:29 GMT+08:00 Adunuthula, Seshu <[email protected]>:
>> >
>> > > As a short term fix, does it make sense to make this a tunable
>> parameter
>> > > and move this to a config file?
>> > >
>> > > On 5/15/15, 5:58 AM, "Huang Hua" <[email protected]> wrote:
>> > >
>> > > >Hi Dong,
>> > > >
>> > > >I don't think so. You can safely change that setting but then you
>> need
>> > to
>> > > >recompile kylin to generate the new war(don't use the deploy.sh
>> because
>> > > >that will wipe out all your kylin hbase meta storage). After the war
>> is
>> > > >generated, put that war under Tomcat webapps directory and restarts
>> the
>> > > >Tomcat. That should work well.
>> > > >
>> > > >Best.
>> > > >Hua
>> > > >> -----邮件原件-----
>> > > >> 发件人: dev-return-1698-
>> > > >> [email protected] [mailto:
>> > dev-return-
>> > > >> [email protected]] 代表 dong
>> > > >> wang
>> > > >> 发送时间: 2015年5月15日 18:54
>> > > >> 收件人: [email protected]
>> > > >> 主题: Re: Increase query performance
>> > > >>
>> > > >> I found the setting for the threshold locates in
>> StorageContext.java,
>> > > >>the
>> > > >> related piece of codes are:
>> > > >> public class StorageContext {
>> > > >>
>> > > >>     public static final int HARD_THRESHOLD = 4000000;
>> > > >>
>> > > >>
>> > > >> thus, I have a question that currently I have already built some
>> > > >>segments
>> > > >> successfully,  later on, if I change the threshold much greater,
>> will
>> > > >>it affect the
>> > > >> existing data in the cube storage?
>> > > >>
>> > > >> 2015-05-15 18:48 GMT+08:00 dong wang <[email protected]>:
>> > > >>
>> > > >> > Hi all, today I also met with the same problem, however, maybe
>> mine
>> > is
>> > > >> > much more strange, the SQL lies in the following:
>> > > >> > select count(* ) from (select 1 from test1 where condtionx group
>> by
>> > > >> > col1, col2, col3) t1
>> > > >> >
>> > > >> > since the result of the sub query is greater than 4000000, the
>> > > >> > exception is thrown out~ however, the final row count of the the
>> > whole
>> > > >> > SQL is just 1 row, such kind of SQL is usually implemented to
>> obtain
>> > > >> > the total row count of some queries for paging feature~
>> > > >> >
>> > > >> > 2015-05-13 18:15 GMT+08:00 Parkavi Nandagopal <
>> [email protected]>:
>> > > >> >
>> > > >> >> After getting that below error (Scan row count exceeded
>> threshold:
>> > > >> >> 4000000), kylin is stopped/crashed automatically.
>> > > >> >> Is Kylin single point of Failure?
>> > > >> >> How to make it has an High availability?
>> > > >> >>
>> > > >> >> Thanks,
>> > > >> >> Parkavi.
>> > > >> >>
>> > > >> >>
>> > > >> >> -----Original Message-----
>> > > >> >> From: Parkavi Nandagopal
>> > > >> >> Sent: Wednesday, May 13, 2015 10:49 AM
>> > > >> >> To: dev; '[email protected]'
>> > > >> >> Subject: RE: Increase query performance
>> > > >> >>
>> > > >> >> Size of my hive fact table = 3.27 GB ( row count 25,236,160)
>> Cube
>> > > >> >> size =
>> > > >> >> 2.21 GB
>> > > >> >>
>> > > >> >> I created hierarchy dimension with 18 levels.
>> > > >> >> Col1 -> Col2 -> ......upto Col18
>> > > >> >> For this 18 levels, total cardinality = 2635
>> > > >> >>
>> > > >> >> I attached 2 log files.
>> > > >> >> Log1 - query with limit 1000000
>> > > >> >> Partial result came.
>> > > >> >> Log2 - Clicked show all in Query result.
>> > > >> >> Getting ERROR : exception while executing query: Scan row count
>> > > >> >> exceeded
>> > > >> >> threshold: 4000000, please add filter condition to narrow down
>> > > >> >> backend scan range, like where clause.
>> > > >> >>
>> > > >> >> Thanks,
>> > > >> >> Parkavi.
>> > > >> >>
>> > > >> >> -----Original Message-----
>> > > >> >> From: hongbin ma [mailto:[email protected]]
>> > > >> >> Sent: Wednesday, May 13, 2015 7:15 AM
>> > > >> >> To: dev
>> > > >> >> Subject: Re: Increase query performance
>> > > >> >>
>> > > >> >> before you expand your cluster, you might need to analyse why
>> it's
>> > > >> >> delivering poor performance.
>> > > >> >>
>> > > >> >> how about the size of your hive fact table? the cardinality of
>> the
>> > > >> >> dimension columns?
>> > > >> >>
>> > > >> >> if possible you can run a query,and paste the query's log in
>> > > >> >> KYLIN_HOME/logs/kylin.log for that query. we can help you check
>> for
>> > > >> >> any abnormalities. (make sure you're writing a slightly
>> different
>> > > >> >> query, to avoid hitting cache)
>> > > >> >>
>> > > >> >> On Tue, May 12, 2015 at 2:04 PM, Parkavi Nandagopal
>> > > >> >> <[email protected]>
>> > > >> >> wrote:
>> > > >> >>
>> > > >> >> > Hi ,
>> > > >> >> >
>> > > >> >> > I have installed kylin and created cube(3GB size) with only
>> one
>> > > >> >> > region server and when I query the cube data, it is taking
>> much
>> > > >> >> > time to show the query result in Kylin web UI.
>> > > >> >> > If I add 3 or more region server node with high configuration
>> > and I
>> > > >> >> > create a cube then query the cube means will it increase the
>> > query
>> > > >> >> performance?
>> > > >> >> >
>> > > >> >> >
>> > > >> >> > Thanks,
>> > > >> >> > Parkavi.
>> > > >> >> >
>> > > >> >> >
>> > > >> >> > ::DISCLAIMER::
>> > > >> >> >
>> > > >> >> >
>> > -------------------------------------------------------------------
>> > > >> >> > ---
>> > > >> >> >
>> > -------------------------------------------------------------------
>> > > >> >> > ---
>> > > >> >> > --------
>> > > >> >> >
>> > > >> >> > The contents of this e-mail and any attachment(s) are
>> > confidential
>> > > >> >> > and intended for the named recipient(s) only.
>> > > >> >> > E-mail transmission is not guaranteed to be secure or
>> error-free
>> > as
>> > > >> >> > information could be intercepted, corrupted, lost, destroyed,
>> > > >> >> > arrive late or incomplete, or may contain viruses in
>> > transmission.
>> > > >> >> > The e mail and its contents (with or without referred errors)
>> > shall
>> > > >> >> > therefore not attach any liability on the originator or HCL or
>> > its
>> > > >>affiliates.
>> > > >> >> > Views or opinions, if any, presented in this email are solely
>> > those
>> > > >> >> > of the author and may not necessarily reflect the views or
>> > opinions
>> > > >> >> > of HCL or its affiliates. Any form of reproduction,
>> > dissemination,
>> > > >> >> > copying, disclosure, modification, distribution and / or
>> > > >> >> > publication of this message without the prior written consent
>> of
>> > > >> >> > authorized representative of HCL is strictly prohibited. If
>> you
>> > > >> >> > have received this email in error please delete it and notify
>> the
>> > > >> >> > sender immediately.
>> > > >> >> > Before opening any email and/or attachments, please check them
>> > for
>> > > >> >> > viruses and other defects.
>> > > >> >> >
>> > > >> >> >
>> > > >> >> >
>> > -------------------------------------------------------------------
>> > > >> >> > ---
>> > > >> >> >
>> > -------------------------------------------------------------------
>> > > >> >> > ---
>> > > >> >> > --------
>> > > >> >> >
>> > > >> >>
>> > > >> >>
>> > > >> >>
>> > > >> >> --
>> > > >> >> Regards,
>> > > >> >>
>> > > >> >> *Bin Mahone | 马洪宾*
>> > > >> >> Apache Kylin: http://kylin.io
>> > > >> >> Github: https://github.com/binmahone
>> > > >> >>
>> > > >> >
>> > > >> >
>> > > >
>> > > >
>> > >
>> > >
>> >
>>
>
>

Re: 答复: Increase query performance

Reply via email to