Hi We are kylin 0.7.2 . A screenshot of the call stack is attached for reference.
Yesterday we have done some more debugging and we added a timeout check in co processor AggregationScanner -> buildAggrCache similar to checkMemoryUsage() check in the co processor but when we enabled fuzzy keys it simply remains stuck for hours. It's not even looping as even when we added timeout checks of 1 min, the timeout never happened but the co processor was hung for a long time and we had to bounce the regionserver. If you could explain what is causing the co processor to remain hung for so long and not even loop in. Is it just stuck on the scan forever. After this when we disable the fuzzy keys, the timeout does get executed. On further analysis we tried to reduce the fuzzy_value_cap and brought it down to 20. The problem is that when we switch on fuzzy and have filters which lead to IN clause, the co processor is not deterministic and it goes into a spin sometimes and it executes fine sometimes which becomes an issue as we need deterministic performance and do not want to co processor to be running for ever. Some queries run fine and are very fast and some just get stuck forever. The client time out with an rpc timeout but the co processor thread just hogs the CPU. Please comment. Thanks On Tue, Sep 22, 2015 at 7:14 AM, hongbin ma <[email protected]> wrote: > hi vipul, > > what version are you using? before > https://issues.apache.org/jira/browse/KYLIN-740 we did spot some critical > performance issues caused by many IN clauses, if you could help to provide > a CPU/heap analysis(on your hbase's region server) it would be easier to > address the problem. > > On Mon, Sep 21, 2015 at 10:42 PM, vipul jhawar <[email protected]> > wrote: > > > Hi > > > > Have noticed a pattern that which caused the co processor to spike the > > regionserver cpu to 100% over time. > > If we end up issuing a query thru kylin which may involve a scanning a > lot > > of data assuming multiple days with multiple filters for many dimensions > in > > which case it has to scan a large number of rows and if it doesnt return > in > > the required rpc timeout then the client does get an error message with > the > > exception, but on the regionserver we see no end to processing and it > > ultimately hogs the regionserver. > > > > Are there any configs on the coprocessor which can be configured to say > > that if the processing is not completed in N time, then simply timeout as > > that way we can look at the queries later but avoid cpu spike as it makes > > the cluster unusable. > > > > Thanks > > > > > > -- > Regards, > > *Bin Mahone | 马洪宾* > Apache Kylin: http://kylin.io > Github: https://github.com/binmahone >
