答复: 答复: Increase query performance

Huang Hua Sun, 17 May 2015 20:39:12 -0700

Yes, there is no risk to your underlying cube data. Let's we explain a little 
bit more to you:
1. cube data is mainly stored in hbase(metadata, final cube data) and 
hdfs(dictionary, intermediate data);
2. kylin.war is an app to tomcat which mainly does three things:
  2.1. reads/writes project, cube and table related info from/to hbase;
  2.2. handle cubing jobs, such as cube creation, cube building, etc;
  2.3. answers sql queries by translating sql into hbase scans;


If for example, you shut down the tomcat(kylin.sh stop), what you've got now is 
no cubing, no sql queries, but all your cube data is still there on hbase and 
hdfs.
So you can go ahead to modify the kylin source code to suit your need and 
rebuild the package to get a new kylin.war to replace the old one(remove the 
old webapps/kylin directory as well), then you restart the tomcat which will 
generate a new webapps/kylin directory from the newly updated kylin.war for 
you. 

The safest way that I recommend is that:
1. Use another machine(like a linux box) instead of the production server to 
rebuild the code and get the newly kylin.war;
2. stop the tomcat on production server and replace the kylin.war on the 
production server with the new one;
3. remove the directory TOMCAT_HOME/webapps/kylin on your production server 
first and then start tomcat;
In this process, you don't lose any data, all the cube data is untouched and 
all your production configuration is untouched as well.

Best,
Hua
> -----邮件原件-----
> 发件人: dev-return-1713-
> [email protected] [mailto:dev-return-
> [email protected]] 代表 dong
> wang
> 发送时间: 2015年5月18日 10:24
> 收件人: [email protected]
> 主题: Re: 答复: Increase query performance
> 
> Thanks hua, usually users don't need to fetch 4,000,000 + rows of the result,
> but for the intermediate query result, the row number may be much more
> than 4,000,000+ rows,  in your above reply, u mentioned that we can just
> change the value of the setting, then rebuild the codes and restart the
> tomcat,  is it what you have already tested?  since currently there are so
> much data in the existing cubes, I have to make it sure that all such
> operations are safe to take~
> 
> 2015-05-15 21:29 GMT+08:00 Adunuthula, Seshu <[email protected]>:
> 
> > As a short term fix, does it make sense to make this a tunable
> > parameter and move this to a config file?
> >
> > On 5/15/15, 5:58 AM, "Huang Hua" <[email protected]> wrote:
> >
> > >Hi Dong,
> > >
> > >I don't think so. You can safely change that setting but then you
> > >need to recompile kylin to generate the new war(don't use the
> > >deploy.sh because that will wipe out all your kylin hbase meta
> > >storage). After the war is generated, put that war under Tomcat
> > >webapps directory and restarts the Tomcat. That should work well.
> > >
> > >Best.
> > >Hua
> > >> -----邮件原件-----
> > >> 发件人: dev-return-1698-
> > >> [email protected]
> > >> [mailto:dev-return-
> > >> [email protected]] 代表
> dong
> > >> wang
> > >> 发送时间: 2015年5月15日 18:54
> > >> 收件人: [email protected]
> > >> 主题: Re: Increase query performance
> > >>
> > >> I found the setting for the threshold locates in
> > >>StorageContext.java, the  related piece of codes are:
> > >> public class StorageContext {
> > >>
> > >>     public static final int HARD_THRESHOLD = 4000000;
> > >>
> > >>
> > >> thus, I have a question that currently I have already built some
> > >>segments  successfully,  later on, if I change the threshold much
> > >>greater, will it affect the  existing data in the cube storage?
> > >>
> > >> 2015-05-15 18:48 GMT+08:00 dong wang <[email protected]>:
> > >>
> > >> > Hi all, today I also met with the same problem, however, maybe
> > >> > mine is much more strange, the SQL lies in the following:
> > >> > select count(* ) from (select 1 from test1 where condtionx group
> > >> > by col1, col2, col3) t1
> > >> >
> > >> > since the result of the sub query is greater than 4000000, the
> > >> > exception is thrown out~ however, the final row count of the the
> > >> > whole SQL is just 1 row, such kind of SQL is usually implemented
> > >> > to obtain the total row count of some queries for paging feature~
> > >> >
> > >> > 2015-05-13 18:15 GMT+08:00 Parkavi Nandagopal
> <[email protected]>:
> > >> >
> > >> >> After getting that below error (Scan row count exceeded threshold:
> > >> >> 4000000), kylin is stopped/crashed automatically.
> > >> >> Is Kylin single point of Failure?
> > >> >> How to make it has an High availability?
> > >> >>
> > >> >> Thanks,
> > >> >> Parkavi.
> > >> >>
> > >> >>
> > >> >> -----Original Message-----
> > >> >> From: Parkavi Nandagopal
> > >> >> Sent: Wednesday, May 13, 2015 10:49 AM
> > >> >> To: dev; '[email protected]'
> > >> >> Subject: RE: Increase query performance
> > >> >>
> > >> >> Size of my hive fact table = 3.27 GB ( row count 25,236,160)
> > >> >> Cube size =
> > >> >> 2.21 GB
> > >> >>
> > >> >> I created hierarchy dimension with 18 levels.
> > >> >> Col1 -> Col2 -> ......upto Col18 For this 18 levels, total
> > >> >> cardinality = 2635
> > >> >>
> > >> >> I attached 2 log files.
> > >> >> Log1 - query with limit 1000000
> > >> >> Partial result came.
> > >> >> Log2 - Clicked show all in Query result.
> > >> >> Getting ERROR : exception while executing query: Scan row count
> > >> >> exceeded
> > >> >> threshold: 4000000, please add filter condition to narrow down
> > >> >> backend scan range, like where clause.
> > >> >>
> > >> >> Thanks,
> > >> >> Parkavi.
> > >> >>
> > >> >> -----Original Message-----
> > >> >> From: hongbin ma [mailto:[email protected]]
> > >> >> Sent: Wednesday, May 13, 2015 7:15 AM
> > >> >> To: dev
> > >> >> Subject: Re: Increase query performance
> > >> >>
> > >> >> before you expand your cluster, you might need to analyse why
> > >> >> it's delivering poor performance.
> > >> >>
> > >> >> how about the size of your hive fact table? the cardinality of
> > >> >> the dimension columns?
> > >> >>
> > >> >> if possible you can run a query,and paste the query's log in
> > >> >> KYLIN_HOME/logs/kylin.log for that query. we can help you check
> > >> >> for any abnormalities. (make sure you're writing a slightly
> > >> >> different query, to avoid hitting cache)
> > >> >>
> > >> >> On Tue, May 12, 2015 at 2:04 PM, Parkavi Nandagopal
> > >> >> <[email protected]>
> > >> >> wrote:
> > >> >>
> > >> >> > Hi ,
> > >> >> >
> > >> >> > I have installed kylin and created cube(3GB size) with only
> > >> >> > one region server and when I query the cube data, it is taking
> > >> >> > much time to show the query result in Kylin web UI.
> > >> >> > If I add 3 or more region server node with high configuration
> > >> >> > and I create a cube then query the cube means will it increase
> > >> >> > the query
> > >> >> performance?
> > >> >> >
> > >> >> >
> > >> >> > Thanks,
> > >> >> > Parkavi.
> > >> >> >
> > >> >> >
> > >> >> > ::DISCLAIMER::
> > >> >> >
> > >> >> > --------------------------------------------------------------
> > >> >> > -----
> > >> >> > ---
> > >> >> > --------------------------------------------------------------
> > >> >> > -----
> > >> >> > ---
> > >> >> > --------
> > >> >> >
> > >> >> > The contents of this e-mail and any attachment(s) are
> > >> >> > confidential and intended for the named recipient(s) only.
> > >> >> > E-mail transmission is not guaranteed to be secure or
> > >> >> > error-free as information could be intercepted, corrupted,
> > >> >> > lost, destroyed, arrive late or incomplete, or may contain viruses 
> > >> >> > in
> transmission.
> > >> >> > The e mail and its contents (with or without referred errors)
> > >> >> > shall therefore not attach any liability on the originator or
> > >> >> > HCL or its
> > >>affiliates.
> > >> >> > Views or opinions, if any, presented in this email are solely
> > >> >> > those of the author and may not necessarily reflect the views
> > >> >> > or opinions of HCL or its affiliates. Any form of
> > >> >> > reproduction, dissemination, copying, disclosure,
> > >> >> > modification, distribution and / or publication of this
> > >> >> > message without the prior written consent of authorized
> > >> >> > representative of HCL is strictly prohibited. If you have
> > >> >> > received this email in error please delete it and notify the sender
> immediately.
> > >> >> > Before opening any email and/or attachments, please check them
> > >> >> > for viruses and other defects.
> > >> >> >
> > >> >> >
> > >> >> > --------------------------------------------------------------
> > >> >> > -----
> > >> >> > ---
> > >> >> > --------------------------------------------------------------
> > >> >> > -----
> > >> >> > ---
> > >> >> > --------
> > >> >> >
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Regards,
> > >> >>
> > >> >> *Bin Mahone | 马洪宾*
> > >> >> Apache Kylin: http://kylin.io
> > >> >> Github: https://github.com/binmahone
> > >> >>
> > >> >
> > >> >
> > >
> > >
> >
> >

答复: 答复: Increase query performance

Reply via email to