Re: coprocessor cause 100% cpu

Li Yang Mon, 28 Sep 2015 23:16:45 -0700

Despite the great finding on fuzzy filter, there has to be a protection
mechanism to prevent coprocessor from exhausting mem or cpu of region
server after client timeout.  Marking a JIRA for this.


https://issues.apache.org/jira/browse/KYLIN-1050




On Wed, Sep 23, 2015 at 4:46 PM, hongbin ma <[email protected]> wrote:

> If you have many filters or IN clauses in your query, Kylin will generate a
> lot of fuzzy keys for hbase scan. A proper amount of fuzzy keys will be
> beneficial for hbase scanning, but when the number of fuzzy keys grow too
> large, the performance of scanning will dramatically degrade, as
> FuzzyKeyFilter will explore a large space of possibilities, and there is no
> easy way to overcome this issue, see my patch to hbase at:
>
> https://issues.apache.org/jira/browse/HBASE-14269
>
> The side-effect is the high CPU usage you're observing.
>
> so in  https://issues.apache.org/jira/browse/KYLIN-740, whenever we find
> there're too many fuzzy filters generated(by using a magic number as
> threshold), we'll discard them all, and scan hbase without any fuzzy keys.
>
> hope this is useful to you
>
>
>
>
>
>
>
>
> On Tue, Sep 22, 2015 at 11:26 PM, vipul jhawar <[email protected]>
> wrote:
>
> > Looks like attachments are stripped off the email.
> > Here is a screenshot -
> > https://monosnap.com/file/JmpHEMxJVVQUhTLxTrzE1sWDn7gXg4
> >
> > On Tue, Sep 22, 2015 at 5:32 PM, vipul jhawar <[email protected]>
> > wrote:
> >
> > > Hi hongbin
> > >
> > > It is attached in the previous reply.
> > > Attached again.
> > >
> > > Thanks
> > >
> > > On Tue, Sep 22, 2015 at 11:58 AM, hongbin ma <[email protected]>
> > wrote:
> > >
> > >> hi
> > >>
> > >> did you forget to attach the screenshot?
> > >>
> > >> On Tue, Sep 22, 2015 at 12:11 PM, vipul jhawar <
> [email protected]>
> > >> wrote:
> > >>
> > >> > Hi
> > >> >
> > >> > We are kylin 0.7.2 .
> > >> > A screenshot of the call stack is attached for reference.
> > >> >
> > >> > Yesterday we have done some more debugging and we added a timeout
> > check
> > >> in
> > >> > co processor AggregationScanner -> buildAggrCache
> > >> > similar to checkMemoryUsage() check in the co processor but when we
> > >> > enabled fuzzy keys it simply remains stuck for hours.
> > >> > It's not even looping as even when we added timeout checks of 1 min,
> > the
> > >> > timeout never happened but the co processor was hung for a long time
> > >> and we
> > >> > had to bounce the regionserver. If you could explain what is causing
> > >> the co
> > >> > processor to remain hung for so long and not even loop in. Is it
> just
> > >> stuck
> > >> > on the scan forever.
> > >> >
> > >> > After this when we disable the fuzzy keys, the timeout does get
> > >> executed.
> > >> > On further analysis we tried to reduce the fuzzy_value_cap and
> brought
> > >> it
> > >> > down to 20.
> > >> > The problem is that when we switch on fuzzy and have filters which
> > lead
> > >> to
> > >> > IN clause, the co processor is not deterministic and it goes into a
> > spin
> > >> > sometimes and it executes fine sometimes which becomes an issue as
> we
> > >> need
> > >> > deterministic performance and do not want to co processor to be
> > running
> > >> for
> > >> > ever. Some queries run fine and are very fast and some just get
> stuck
> > >> > forever.
> > >> >
> > >> > The client time out with an rpc timeout but the co processor thread
> > just
> > >> > hogs the CPU.
> > >> >
> > >> > Please comment.
> > >> >
> > >> > Thanks
> > >> >
> > >> >
> > >> > On Tue, Sep 22, 2015 at 7:14 AM, hongbin ma <[email protected]>
> > >> wrote:
> > >> >
> > >> >> hi vipul,
> > >> >>
> > >> >> what version are you using? before
> > >> >> https://issues.apache.org/jira/browse/KYLIN-740 we did spot some
> > >> critical
> > >> >> performance issues caused by many IN clauses, if you could help to
> > >> provide
> > >> >> a CPU/heap analysis(on your hbase's region server) it would be
> easier
> > >> to
> > >> >> address the problem.
> > >> >>
> > >> >> On Mon, Sep 21, 2015 at 10:42 PM, vipul jhawar <
> > [email protected]
> > >> >
> > >> >> wrote:
> > >> >>
> > >> >> > Hi
> > >> >> >
> > >> >> > Have noticed a pattern that which caused the co processor to
> spike
> > >> the
> > >> >> > regionserver cpu to 100% over time.
> > >> >> > If we end up issuing a query thru kylin which may involve a
> > scanning
> > >> a
> > >> >> lot
> > >> >> > of data assuming multiple days with multiple filters for many
> > >> >> dimensions in
> > >> >> > which case it has to scan a large number of rows and if it doesnt
> > >> >> return in
> > >> >> > the required rpc timeout then the client does get an error
> message
> > >> with
> > >> >> the
> > >> >> > exception, but on the regionserver we see no end to processing
> and
> > it
> > >> >> > ultimately hogs the regionserver.
> > >> >> >
> > >> >> > Are there any configs on the coprocessor which can be configured
> to
> > >> say
> > >> >> > that if the processing is not completed in N time, then simply
> > >> timeout
> > >> >> as
> > >> >> > that way we can look at the queries later but avoid cpu spike as
> it
> > >> >> makes
> > >> >> > the cluster unusable.
> > >> >> >
> > >> >> > Thanks
> > >> >> >
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Regards,
> > >> >>
> > >> >> *Bin Mahone | 马洪宾*
> > >> >> Apache Kylin: http://kylin.io
> > >> >> Github: https://github.com/binmahone
> > >> >>
> > >> >
> > >> >
> > >>
> > >>
> > >> --
> > >> Regards,
> > >>
> > >> *Bin Mahone | 马洪宾*
> > >> Apache Kylin: http://kylin.io
> > >> Github: https://github.com/binmahone
> > >>
> > >
> > >
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: coprocessor cause 100% cpu

Reply via email to