Re: coprocessor cause 100% cpu

vipul jhawar Mon, 21 Sep 2015 21:12:43 -0700

Hi

We are kylin 0.7.2 .
A screenshot of the call stack is attached for reference.

Yesterday we have done some more debugging and we added a timeout check in
co processor AggregationScanner -> buildAggrCache
similar to checkMemoryUsage() check in the co processor but when we enabled
fuzzy keys it simply remains stuck for hours.
It's not even looping as even when we added timeout checks of 1 min, the
timeout never happened but the co processor was hung for a long time and we
had to bounce the regionserver. If you could explain what is causing the co
processor to remain hung for so long and not even loop in. Is it just stuck
on the scan forever.

After this when we disable the fuzzy keys, the timeout does get executed.
On further analysis we tried to reduce the fuzzy_value_cap and brought it
down to 20.
The problem is that when we switch on fuzzy and have filters which lead to
IN clause, the co processor is not deterministic and it goes into a spin
sometimes and it executes fine sometimes which becomes an issue as we need
deterministic performance and do not want to co processor to be running for
ever. Some queries run fine and are very fast and some just get stuck
forever.

The client time out with an rpc timeout but the co processor thread just
hogs the CPU.

Please comment.

Thanks

On Tue, Sep 22, 2015 at 7:14 AM, hongbin ma <[email protected]> wrote:

> hi vipul,
>
> what version are you using? before
> https://issues.apache.org/jira/browse/KYLIN-740 we did spot some critical
> performance issues caused by many IN clauses, if you could help to provide
> a CPU/heap analysis(on your hbase's region server) it would be easier to
> address the problem.
>
> On Mon, Sep 21, 2015 at 10:42 PM, vipul jhawar <[email protected]>
> wrote:
>
> > Hi
> >
> > Have noticed a pattern that which caused the co processor to spike the
> > regionserver cpu to 100% over time.
> > If we end up issuing a query thru kylin which may involve a scanning a
> lot
> > of data assuming multiple days with multiple filters for many dimensions
> in
> > which case it has to scan a large number of rows and if it doesnt return
> in
> > the required rpc timeout then the client does get an error message with
> the
> > exception, but on the regionserver we see no end to processing and it
> > ultimately hogs the regionserver.
> >
> > Are there any configs on the coprocessor which can be configured to say
> > that if the processing is not completed in N time, then simply timeout as
> > that way we can look at the queries later but avoid cpu spike as it makes
> > the cluster unusable.
> >
> > Thanks
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: coprocessor cause 100% cpu

Reply via email to