In addition, what's the difference between kylin.query.scan.threshold inside kylin.properties and HARD_THRESHOLD inside class StorageContext?
2015-05-19 16:29 GMT+08:00 dong wang <[email protected]>: > Thanks Hua, from ur tests and detailed explanation, it should be safe to > change the value of the threshold directly and rebuild the codes. however, > I still have 2 questions about the threshold, from the codes, there are 2 > places which set the threshold: > 1, long rowEst = MEM_BUDGET_PER_QUERY / rowSizeEst; > context.setThreshold((int) rowEst); > > 2, String propThreshold = > connProps.getProperty(OLAPQuery.PROP_SCAN_THRESHOLD); > int threshold = Integer.valueOf(propThreshold); > olapContext.storageContext.setThreshold(threshold); > > > does anyone know what's the purposes for the 2 settings? > > 2015-05-18 11:40 GMT+08:00 Li Yang <[email protected]>: > >> A query goes through GUI -> Query Engine -> HBase. You can analyze the >> response time of each step. >> >> - GUI, see browser console >> - Query Engine, see the "===[QUERY]===" log lines. In the kylinlog1.txt, >> it's 151 seconds >> - HBase, see the "HBase Metrics" log line. However I didn't see it in the >> log files. Wield. >> >> Concerning the SQL that pulls all 18 columns, transferring the big result >> alone may already eats up most of the time. >> >> Btw, Kylin is not designed for ETL purpose, we don't expect user to pull >> millions of rows, at least not for usual cases. Result set is typically a >> few thousands for interactive analysis. >> >> Cheers >> Yang >> >> On Mon, May 18, 2015 at 10:23 AM, dong wang <[email protected]> >> wrote: >> >> > Thanks hua, usually users don't need to fetch 4,000,000 + rows of the >> > result, but for the intermediate query result, the row number may be >> much >> > more than 4,000,000+ rows, in your above reply, u mentioned that we can >> > just change the value of the setting, then rebuild the codes and restart >> > the tomcat, is it what you have already tested? since currently there >> are >> > so much data in the existing cubes, I have to make it sure that all such >> > operations are safe to take~ >> > >> > 2015-05-15 21:29 GMT+08:00 Adunuthula, Seshu <[email protected]>: >> > >> > > As a short term fix, does it make sense to make this a tunable >> parameter >> > > and move this to a config file? >> > > >> > > On 5/15/15, 5:58 AM, "Huang Hua" <[email protected]> wrote: >> > > >> > > >Hi Dong, >> > > > >> > > >I don't think so. You can safely change that setting but then you >> need >> > to >> > > >recompile kylin to generate the new war(don't use the deploy.sh >> because >> > > >that will wipe out all your kylin hbase meta storage). After the war >> is >> > > >generated, put that war under Tomcat webapps directory and restarts >> the >> > > >Tomcat. That should work well. >> > > > >> > > >Best. >> > > >Hua >> > > >> -----邮件原件----- >> > > >> 发件人: dev-return-1698- >> > > >> [email protected] [mailto: >> > dev-return- >> > > >> [email protected]] 代表 dong >> > > >> wang >> > > >> 发送时间: 2015年5月15日 18:54 >> > > >> 收件人: [email protected] >> > > >> 主题: Re: Increase query performance >> > > >> >> > > >> I found the setting for the threshold locates in >> StorageContext.java, >> > > >>the >> > > >> related piece of codes are: >> > > >> public class StorageContext { >> > > >> >> > > >> public static final int HARD_THRESHOLD = 4000000; >> > > >> >> > > >> >> > > >> thus, I have a question that currently I have already built some >> > > >>segments >> > > >> successfully, later on, if I change the threshold much greater, >> will >> > > >>it affect the >> > > >> existing data in the cube storage? >> > > >> >> > > >> 2015-05-15 18:48 GMT+08:00 dong wang <[email protected]>: >> > > >> >> > > >> > Hi all, today I also met with the same problem, however, maybe >> mine >> > is >> > > >> > much more strange, the SQL lies in the following: >> > > >> > select count(* ) from (select 1 from test1 where condtionx group >> by >> > > >> > col1, col2, col3) t1 >> > > >> > >> > > >> > since the result of the sub query is greater than 4000000, the >> > > >> > exception is thrown out~ however, the final row count of the the >> > whole >> > > >> > SQL is just 1 row, such kind of SQL is usually implemented to >> obtain >> > > >> > the total row count of some queries for paging feature~ >> > > >> > >> > > >> > 2015-05-13 18:15 GMT+08:00 Parkavi Nandagopal < >> [email protected]>: >> > > >> > >> > > >> >> After getting that below error (Scan row count exceeded >> threshold: >> > > >> >> 4000000), kylin is stopped/crashed automatically. >> > > >> >> Is Kylin single point of Failure? >> > > >> >> How to make it has an High availability? >> > > >> >> >> > > >> >> Thanks, >> > > >> >> Parkavi. >> > > >> >> >> > > >> >> >> > > >> >> -----Original Message----- >> > > >> >> From: Parkavi Nandagopal >> > > >> >> Sent: Wednesday, May 13, 2015 10:49 AM >> > > >> >> To: dev; '[email protected]' >> > > >> >> Subject: RE: Increase query performance >> > > >> >> >> > > >> >> Size of my hive fact table = 3.27 GB ( row count 25,236,160) >> Cube >> > > >> >> size = >> > > >> >> 2.21 GB >> > > >> >> >> > > >> >> I created hierarchy dimension with 18 levels. >> > > >> >> Col1 -> Col2 -> ......upto Col18 >> > > >> >> For this 18 levels, total cardinality = 2635 >> > > >> >> >> > > >> >> I attached 2 log files. >> > > >> >> Log1 - query with limit 1000000 >> > > >> >> Partial result came. >> > > >> >> Log2 - Clicked show all in Query result. >> > > >> >> Getting ERROR : exception while executing query: Scan row count >> > > >> >> exceeded >> > > >> >> threshold: 4000000, please add filter condition to narrow down >> > > >> >> backend scan range, like where clause. >> > > >> >> >> > > >> >> Thanks, >> > > >> >> Parkavi. >> > > >> >> >> > > >> >> -----Original Message----- >> > > >> >> From: hongbin ma [mailto:[email protected]] >> > > >> >> Sent: Wednesday, May 13, 2015 7:15 AM >> > > >> >> To: dev >> > > >> >> Subject: Re: Increase query performance >> > > >> >> >> > > >> >> before you expand your cluster, you might need to analyse why >> it's >> > > >> >> delivering poor performance. >> > > >> >> >> > > >> >> how about the size of your hive fact table? the cardinality of >> the >> > > >> >> dimension columns? >> > > >> >> >> > > >> >> if possible you can run a query,and paste the query's log in >> > > >> >> KYLIN_HOME/logs/kylin.log for that query. we can help you check >> for >> > > >> >> any abnormalities. (make sure you're writing a slightly >> different >> > > >> >> query, to avoid hitting cache) >> > > >> >> >> > > >> >> On Tue, May 12, 2015 at 2:04 PM, Parkavi Nandagopal >> > > >> >> <[email protected]> >> > > >> >> wrote: >> > > >> >> >> > > >> >> > Hi , >> > > >> >> > >> > > >> >> > I have installed kylin and created cube(3GB size) with only >> one >> > > >> >> > region server and when I query the cube data, it is taking >> much >> > > >> >> > time to show the query result in Kylin web UI. >> > > >> >> > If I add 3 or more region server node with high configuration >> > and I >> > > >> >> > create a cube then query the cube means will it increase the >> > query >> > > >> >> performance? >> > > >> >> > >> > > >> >> > >> > > >> >> > Thanks, >> > > >> >> > Parkavi. >> > > >> >> > >> > > >> >> > >> > > >> >> > ::DISCLAIMER:: >> > > >> >> > >> > > >> >> > >> > ------------------------------------------------------------------- >> > > >> >> > --- >> > > >> >> > >> > ------------------------------------------------------------------- >> > > >> >> > --- >> > > >> >> > -------- >> > > >> >> > >> > > >> >> > The contents of this e-mail and any attachment(s) are >> > confidential >> > > >> >> > and intended for the named recipient(s) only. >> > > >> >> > E-mail transmission is not guaranteed to be secure or >> error-free >> > as >> > > >> >> > information could be intercepted, corrupted, lost, destroyed, >> > > >> >> > arrive late or incomplete, or may contain viruses in >> > transmission. >> > > >> >> > The e mail and its contents (with or without referred errors) >> > shall >> > > >> >> > therefore not attach any liability on the originator or HCL or >> > its >> > > >>affiliates. >> > > >> >> > Views or opinions, if any, presented in this email are solely >> > those >> > > >> >> > of the author and may not necessarily reflect the views or >> > opinions >> > > >> >> > of HCL or its affiliates. Any form of reproduction, >> > dissemination, >> > > >> >> > copying, disclosure, modification, distribution and / or >> > > >> >> > publication of this message without the prior written consent >> of >> > > >> >> > authorized representative of HCL is strictly prohibited. If >> you >> > > >> >> > have received this email in error please delete it and notify >> the >> > > >> >> > sender immediately. >> > > >> >> > Before opening any email and/or attachments, please check them >> > for >> > > >> >> > viruses and other defects. >> > > >> >> > >> > > >> >> > >> > > >> >> > >> > ------------------------------------------------------------------- >> > > >> >> > --- >> > > >> >> > >> > ------------------------------------------------------------------- >> > > >> >> > --- >> > > >> >> > -------- >> > > >> >> > >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> -- >> > > >> >> Regards, >> > > >> >> >> > > >> >> *Bin Mahone | 马洪宾* >> > > >> >> Apache Kylin: http://kylin.io >> > > >> >> Github: https://github.com/binmahone >> > > >> >> >> > > >> > >> > > >> > >> > > > >> > > > >> > > >> > > >> > >> > >
