You can scale out the Kylin deployment, Am curious to know the details of your deployment,
- Number of Kylin Server nodes - Number of Hbase nodes. Increasing these should increase your concurrency levels… Regards Seshu Adunuthula On 11/12/15, 1:31 AM, "Li Yang" <[email protected]> wrote: >How frequent are the 100 concurrent queries? Also a `jstack` dump of the >hanging process might help. > >We haven't done stress testing for a while. However eBay production is >quite stable regarding Kylin itself. > >The only similar issue we had before is a query super busy at CPU >resource. >Such slow query cannot be interrupted, thus cannot be killed by >BadQueryDetector. Again `jstack` command should reveal such issue. > >On Thu, Nov 12, 2015 at 11:12 AM, ShaoFeng Shi <[email protected]> >wrote: > >> Hi Chun En, did you analysis those “bad” SQLs, to see whether they well >> matched with the cube design? Kylin doesn't guarantee every query can be >> returned in a very short time, but 80396 seconds need administrator's >> attention. If the query is good, hbase is good, memory is enough, CPU >>is at >> normal level, you need investigate what's the real bottleneck; >> >> Previous in eBay deployment we encountered an extreme case (37 >>dimensions, >> separated into a couple of aggregation groups), when the query cross >> aggregation groups, the time is very long; Later we identified the >> bottleneck and made an enhancement in Kylin v1.1; after that we didn't >> observe such issue. As you already uses Kylin 1.1, I don't think it is >>this >> case. You may need do more investigation or provide more detailed >> information here to analysis. >> >> >> >> 2015-11-11 17:32 GMT+08:00 nichunen <[email protected]>: >> >> > Hi, >> > >> > We did a stress test for our kylin server with 100 concurrent queies. >>It >> > worked fine at first. But after 1 day, we can't query kylin any more, >>and >> > there is log like "query has been running 80396 seconds", many "bad >> > queries" were hung there. Hbase nodes were still alive, and the cubes >>and >> > jobs could still be listed on the pages. To make sure whether it was >>the >> > problem of hbase, I restarted hbase, and did a new query, no log from >> > region server shown hbase received the query, for as we know, a >> successful >> > query will create log like "Klin Coprocessor start; Klin Coprocessor >> > aggregation done". And from the kylin.log, there were still queries >>hung. >> > >> > >> > Do you know what caused the problem? In our opnion, it may be because: >> > 1. We use kylin 1.1 on hbase 1.0.1.1(I modified the hbase version in >> > pom.xml to create the package); >> > 2. The tomcat max threads setting, we didn't modify any setting in >> tomcat; >> > 3. Kylin's problem. >> > >> > BTW, we read the code of BadQueryDetector, and it seems a query thread >> > will be killed only when low available memory and 5 minutes lasted. We >> > doubt may be this is not very reasonable. >> > >> > Best Regards, >> > >> > >> > >> > George/倪春恩 >> > >> > Software Engineer/软件工程师 >> > >> > Mobile:+86-13501723787| Fax:+8610-56842040 >> > >> > 北京明略软件系统有限公司(www <http://www.semidata.com/>.mininglamp.com) >> > >> > 北京市昌平区东小口镇中东路398号中煤建设集团大厦1号楼4层 >> > >> > F4,1#,Zhongmei Construction Group Plaza,398# Zhongdong Road,Changping >> > District,Beijing,102218 >> > >> > >> > >> >>------------------------------------------------------------------------- >>--------------------------------------------------- >> > >> > [image: cid:[email protected]] >> > >> >> >> >> -- >> Best regards, >> >> Shaofeng Shi >>
