Looks like attachments are stripped off the email. Here is a screenshot - https://monosnap.com/file/JmpHEMxJVVQUhTLxTrzE1sWDn7gXg4
On Tue, Sep 22, 2015 at 5:32 PM, vipul jhawar <[email protected]> wrote: > Hi hongbin > > It is attached in the previous reply. > Attached again. > > Thanks > > On Tue, Sep 22, 2015 at 11:58 AM, hongbin ma <[email protected]> wrote: > >> hi >> >> did you forget to attach the screenshot? >> >> On Tue, Sep 22, 2015 at 12:11 PM, vipul jhawar <[email protected]> >> wrote: >> >> > Hi >> > >> > We are kylin 0.7.2 . >> > A screenshot of the call stack is attached for reference. >> > >> > Yesterday we have done some more debugging and we added a timeout check >> in >> > co processor AggregationScanner -> buildAggrCache >> > similar to checkMemoryUsage() check in the co processor but when we >> > enabled fuzzy keys it simply remains stuck for hours. >> > It's not even looping as even when we added timeout checks of 1 min, the >> > timeout never happened but the co processor was hung for a long time >> and we >> > had to bounce the regionserver. If you could explain what is causing >> the co >> > processor to remain hung for so long and not even loop in. Is it just >> stuck >> > on the scan forever. >> > >> > After this when we disable the fuzzy keys, the timeout does get >> executed. >> > On further analysis we tried to reduce the fuzzy_value_cap and brought >> it >> > down to 20. >> > The problem is that when we switch on fuzzy and have filters which lead >> to >> > IN clause, the co processor is not deterministic and it goes into a spin >> > sometimes and it executes fine sometimes which becomes an issue as we >> need >> > deterministic performance and do not want to co processor to be running >> for >> > ever. Some queries run fine and are very fast and some just get stuck >> > forever. >> > >> > The client time out with an rpc timeout but the co processor thread just >> > hogs the CPU. >> > >> > Please comment. >> > >> > Thanks >> > >> > >> > On Tue, Sep 22, 2015 at 7:14 AM, hongbin ma <[email protected]> >> wrote: >> > >> >> hi vipul, >> >> >> >> what version are you using? before >> >> https://issues.apache.org/jira/browse/KYLIN-740 we did spot some >> critical >> >> performance issues caused by many IN clauses, if you could help to >> provide >> >> a CPU/heap analysis(on your hbase's region server) it would be easier >> to >> >> address the problem. >> >> >> >> On Mon, Sep 21, 2015 at 10:42 PM, vipul jhawar <[email protected] >> > >> >> wrote: >> >> >> >> > Hi >> >> > >> >> > Have noticed a pattern that which caused the co processor to spike >> the >> >> > regionserver cpu to 100% over time. >> >> > If we end up issuing a query thru kylin which may involve a scanning >> a >> >> lot >> >> > of data assuming multiple days with multiple filters for many >> >> dimensions in >> >> > which case it has to scan a large number of rows and if it doesnt >> >> return in >> >> > the required rpc timeout then the client does get an error message >> with >> >> the >> >> > exception, but on the regionserver we see no end to processing and it >> >> > ultimately hogs the regionserver. >> >> > >> >> > Are there any configs on the coprocessor which can be configured to >> say >> >> > that if the processing is not completed in N time, then simply >> timeout >> >> as >> >> > that way we can look at the queries later but avoid cpu spike as it >> >> makes >> >> > the cluster unusable. >> >> > >> >> > Thanks >> >> > >> >> >> >> >> >> >> >> -- >> >> Regards, >> >> >> >> *Bin Mahone | 马洪宾* >> >> Apache Kylin: http://kylin.io >> >> Github: https://github.com/binmahone >> >> >> > >> > >> >> >> -- >> Regards, >> >> *Bin Mahone | 马洪宾* >> Apache Kylin: http://kylin.io >> Github: https://github.com/binmahone >> > >
