Despite the great finding on fuzzy filter, there has to be a protection mechanism to prevent coprocessor from exhausting mem or cpu of region server after client timeout. Marking a JIRA for this.
https://issues.apache.org/jira/browse/KYLIN-1050 On Wed, Sep 23, 2015 at 4:46 PM, hongbin ma <[email protected]> wrote: > If you have many filters or IN clauses in your query, Kylin will generate a > lot of fuzzy keys for hbase scan. A proper amount of fuzzy keys will be > beneficial for hbase scanning, but when the number of fuzzy keys grow too > large, the performance of scanning will dramatically degrade, as > FuzzyKeyFilter will explore a large space of possibilities, and there is no > easy way to overcome this issue, see my patch to hbase at: > > https://issues.apache.org/jira/browse/HBASE-14269 > > The side-effect is the high CPU usage you're observing. > > so in https://issues.apache.org/jira/browse/KYLIN-740, whenever we find > there're too many fuzzy filters generated(by using a magic number as > threshold), we'll discard them all, and scan hbase without any fuzzy keys. > > hope this is useful to you > > > > > > > > > On Tue, Sep 22, 2015 at 11:26 PM, vipul jhawar <[email protected]> > wrote: > > > Looks like attachments are stripped off the email. > > Here is a screenshot - > > https://monosnap.com/file/JmpHEMxJVVQUhTLxTrzE1sWDn7gXg4 > > > > On Tue, Sep 22, 2015 at 5:32 PM, vipul jhawar <[email protected]> > > wrote: > > > > > Hi hongbin > > > > > > It is attached in the previous reply. > > > Attached again. > > > > > > Thanks > > > > > > On Tue, Sep 22, 2015 at 11:58 AM, hongbin ma <[email protected]> > > wrote: > > > > > >> hi > > >> > > >> did you forget to attach the screenshot? > > >> > > >> On Tue, Sep 22, 2015 at 12:11 PM, vipul jhawar < > [email protected]> > > >> wrote: > > >> > > >> > Hi > > >> > > > >> > We are kylin 0.7.2 . > > >> > A screenshot of the call stack is attached for reference. > > >> > > > >> > Yesterday we have done some more debugging and we added a timeout > > check > > >> in > > >> > co processor AggregationScanner -> buildAggrCache > > >> > similar to checkMemoryUsage() check in the co processor but when we > > >> > enabled fuzzy keys it simply remains stuck for hours. > > >> > It's not even looping as even when we added timeout checks of 1 min, > > the > > >> > timeout never happened but the co processor was hung for a long time > > >> and we > > >> > had to bounce the regionserver. If you could explain what is causing > > >> the co > > >> > processor to remain hung for so long and not even loop in. Is it > just > > >> stuck > > >> > on the scan forever. > > >> > > > >> > After this when we disable the fuzzy keys, the timeout does get > > >> executed. > > >> > On further analysis we tried to reduce the fuzzy_value_cap and > brought > > >> it > > >> > down to 20. > > >> > The problem is that when we switch on fuzzy and have filters which > > lead > > >> to > > >> > IN clause, the co processor is not deterministic and it goes into a > > spin > > >> > sometimes and it executes fine sometimes which becomes an issue as > we > > >> need > > >> > deterministic performance and do not want to co processor to be > > running > > >> for > > >> > ever. Some queries run fine and are very fast and some just get > stuck > > >> > forever. > > >> > > > >> > The client time out with an rpc timeout but the co processor thread > > just > > >> > hogs the CPU. > > >> > > > >> > Please comment. > > >> > > > >> > Thanks > > >> > > > >> > > > >> > On Tue, Sep 22, 2015 at 7:14 AM, hongbin ma <[email protected]> > > >> wrote: > > >> > > > >> >> hi vipul, > > >> >> > > >> >> what version are you using? before > > >> >> https://issues.apache.org/jira/browse/KYLIN-740 we did spot some > > >> critical > > >> >> performance issues caused by many IN clauses, if you could help to > > >> provide > > >> >> a CPU/heap analysis(on your hbase's region server) it would be > easier > > >> to > > >> >> address the problem. > > >> >> > > >> >> On Mon, Sep 21, 2015 at 10:42 PM, vipul jhawar < > > [email protected] > > >> > > > >> >> wrote: > > >> >> > > >> >> > Hi > > >> >> > > > >> >> > Have noticed a pattern that which caused the co processor to > spike > > >> the > > >> >> > regionserver cpu to 100% over time. > > >> >> > If we end up issuing a query thru kylin which may involve a > > scanning > > >> a > > >> >> lot > > >> >> > of data assuming multiple days with multiple filters for many > > >> >> dimensions in > > >> >> > which case it has to scan a large number of rows and if it doesnt > > >> >> return in > > >> >> > the required rpc timeout then the client does get an error > message > > >> with > > >> >> the > > >> >> > exception, but on the regionserver we see no end to processing > and > > it > > >> >> > ultimately hogs the regionserver. > > >> >> > > > >> >> > Are there any configs on the coprocessor which can be configured > to > > >> say > > >> >> > that if the processing is not completed in N time, then simply > > >> timeout > > >> >> as > > >> >> > that way we can look at the queries later but avoid cpu spike as > it > > >> >> makes > > >> >> > the cluster unusable. > > >> >> > > > >> >> > Thanks > > >> >> > > > >> >> > > >> >> > > >> >> > > >> >> -- > > >> >> Regards, > > >> >> > > >> >> *Bin Mahone | 马洪宾* > > >> >> Apache Kylin: http://kylin.io > > >> >> Github: https://github.com/binmahone > > >> >> > > >> > > > >> > > > >> > > >> > > >> -- > > >> Regards, > > >> > > >> *Bin Mahone | 马洪宾* > > >> Apache Kylin: http://kylin.io > > >> Github: https://github.com/binmahone > > >> > > > > > > > > > > > > -- > Regards, > > *Bin Mahone | 马洪宾* > Apache Kylin: http://kylin.io > Github: https://github.com/binmahone >
