Re: coprocessor cause 100% cpu

vipul jhawar Tue, 22 Sep 2015 08:36:01 -0700

Looks like attachments are stripped off the email.
Here is a screenshot -
https://monosnap.com/file/JmpHEMxJVVQUhTLxTrzE1sWDn7gXg4


On Tue, Sep 22, 2015 at 5:32 PM, vipul jhawar <[email protected]>
wrote:

> Hi hongbin
>
> It is attached in the previous reply.
> Attached again.
>
> Thanks
>
> On Tue, Sep 22, 2015 at 11:58 AM, hongbin ma <[email protected]> wrote:
>
>> hi
>>
>> did you forget to attach the screenshot?
>>
>> On Tue, Sep 22, 2015 at 12:11 PM, vipul jhawar <[email protected]>
>> wrote:
>>
>> > Hi
>> >
>> > We are kylin 0.7.2 .
>> > A screenshot of the call stack is attached for reference.
>> >
>> > Yesterday we have done some more debugging and we added a timeout check
>> in
>> > co processor AggregationScanner -> buildAggrCache
>> > similar to checkMemoryUsage() check in the co processor but when we
>> > enabled fuzzy keys it simply remains stuck for hours.
>> > It's not even looping as even when we added timeout checks of 1 min, the
>> > timeout never happened but the co processor was hung for a long time
>> and we
>> > had to bounce the regionserver. If you could explain what is causing
>> the co
>> > processor to remain hung for so long and not even loop in. Is it just
>> stuck
>> > on the scan forever.
>> >
>> > After this when we disable the fuzzy keys, the timeout does get
>> executed.
>> > On further analysis we tried to reduce the fuzzy_value_cap and brought
>> it
>> > down to 20.
>> > The problem is that when we switch on fuzzy and have filters which lead
>> to
>> > IN clause, the co processor is not deterministic and it goes into a spin
>> > sometimes and it executes fine sometimes which becomes an issue as we
>> need
>> > deterministic performance and do not want to co processor to be running
>> for
>> > ever. Some queries run fine and are very fast and some just get stuck
>> > forever.
>> >
>> > The client time out with an rpc timeout but the co processor thread just
>> > hogs the CPU.
>> >
>> > Please comment.
>> >
>> > Thanks
>> >
>> >
>> > On Tue, Sep 22, 2015 at 7:14 AM, hongbin ma <[email protected]>
>> wrote:
>> >
>> >> hi vipul,
>> >>
>> >> what version are you using? before
>> >> https://issues.apache.org/jira/browse/KYLIN-740 we did spot some
>> critical
>> >> performance issues caused by many IN clauses, if you could help to
>> provide
>> >> a CPU/heap analysis(on your hbase's region server) it would be easier
>> to
>> >> address the problem.
>> >>
>> >> On Mon, Sep 21, 2015 at 10:42 PM, vipul jhawar <[email protected]
>> >
>> >> wrote:
>> >>
>> >> > Hi
>> >> >
>> >> > Have noticed a pattern that which caused the co processor to spike
>> the
>> >> > regionserver cpu to 100% over time.
>> >> > If we end up issuing a query thru kylin which may involve a scanning
>> a
>> >> lot
>> >> > of data assuming multiple days with multiple filters for many
>> >> dimensions in
>> >> > which case it has to scan a large number of rows and if it doesnt
>> >> return in
>> >> > the required rpc timeout then the client does get an error message
>> with
>> >> the
>> >> > exception, but on the regionserver we see no end to processing and it
>> >> > ultimately hogs the regionserver.
>> >> >
>> >> > Are there any configs on the coprocessor which can be configured to
>> say
>> >> > that if the processing is not completed in N time, then simply
>> timeout
>> >> as
>> >> > that way we can look at the queries later but avoid cpu spike as it
>> >> makes
>> >> > the cluster unusable.
>> >> >
>> >> > Thanks
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >>
>> >> *Bin Mahone | 马洪宾*
>> >> Apache Kylin: http://kylin.io
>> >> Github: https://github.com/binmahone
>> >>
>> >
>> >
>>
>>
>> --
>> Regards,
>>
>> *Bin Mahone | 马洪宾*
>> Apache Kylin: http://kylin.io
>> Github: https://github.com/binmahone
>>
>
>

Re: coprocessor cause 100% cpu

Reply via email to