How frequent are the 100 concurrent queries? Also a `jstack` dump of the hanging process might help.
We haven't done stress testing for a while. However eBay production is quite stable regarding Kylin itself. The only similar issue we had before is a query super busy at CPU resource. Such slow query cannot be interrupted, thus cannot be killed by BadQueryDetector. Again `jstack` command should reveal such issue. On Thu, Nov 12, 2015 at 11:12 AM, ShaoFeng Shi <[email protected]> wrote: > Hi Chun En, did you analysis those “bad” SQLs, to see whether they well > matched with the cube design? Kylin doesn't guarantee every query can be > returned in a very short time, but 80396 seconds need administrator's > attention. If the query is good, hbase is good, memory is enough, CPU is at > normal level, you need investigate what's the real bottleneck; > > Previous in eBay deployment we encountered an extreme case (37 dimensions, > separated into a couple of aggregation groups), when the query cross > aggregation groups, the time is very long; Later we identified the > bottleneck and made an enhancement in Kylin v1.1; after that we didn't > observe such issue. As you already uses Kylin 1.1, I don't think it is this > case. You may need do more investigation or provide more detailed > information here to analysis. > > > > 2015-11-11 17:32 GMT+08:00 nichunen <[email protected]>: > > > Hi, > > > > We did a stress test for our kylin server with 100 concurrent queies. It > > worked fine at first. But after 1 day, we can't query kylin any more, and > > there is log like "query has been running 80396 seconds", many "bad > > queries" were hung there. Hbase nodes were still alive, and the cubes and > > jobs could still be listed on the pages. To make sure whether it was the > > problem of hbase, I restarted hbase, and did a new query, no log from > > region server shown hbase received the query, for as we know, a > successful > > query will create log like "Klin Coprocessor start; Klin Coprocessor > > aggregation done". And from the kylin.log, there were still queries hung. > > > > > > Do you know what caused the problem? In our opnion, it may be because: > > 1. We use kylin 1.1 on hbase 1.0.1.1(I modified the hbase version in > > pom.xml to create the package); > > 2. The tomcat max threads setting, we didn't modify any setting in > tomcat; > > 3. Kylin's problem. > > > > BTW, we read the code of BadQueryDetector, and it seems a query thread > > will be killed only when low available memory and 5 minutes lasted. We > > doubt may be this is not very reasonable. > > > > Best Regards, > > > > > > > > George/倪春恩 > > > > Software Engineer/软件工程师 > > > > Mobile:+86-13501723787| Fax:+8610-56842040 > > > > 北京明略软件系统有限公司(www <http://www.semidata.com/>.mininglamp.com) > > > > 北京市昌平区东小口镇中东路398号中煤建设集团大厦1号楼4层 > > > > F4,1#,Zhongmei Construction Group Plaza,398# Zhongdong Road,Changping > > District,Beijing,102218 > > > > > > > ---------------------------------------------------------------------------------------------------------------------------- > > > > [image: cid:[email protected]] > > > > > > -- > Best regards, > > Shaofeng Shi >
