[ https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679387#comment-16679387 ]
Billy Liu commented on KYLIN-3672: ---------------------------------- Impressive. [~zonli] Could you share more info about the root cause? > Performance is poor when multiple queries occur in short period > --------------------------------------------------------------- > > Key: KYLIN-3672 > URL: https://issues.apache.org/jira/browse/KYLIN-3672 > Project: Kylin > Issue Type: Bug > Components: Query Engine > Affects Versions: v2.5.0 > Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456 > Reporter: Zongwei Li > Assignee: Zongwei Li > Priority: Critical > Labels: patch, performance > Attachments: TrendChartBeforeFix.png > > > Hi, Kylin Team > We found one Kylin performance bug during performance tuning for our BI > report integrate with Kylin. > > +Background+ > Our BI report show customer usage report to enterprise customers, provide 15 > usage charts in report page. > Each chart need send API request to Kylin with different SQLs. So it means > for one user, it will trigger 15 API calls(by JDBC) to Kylin. > For our product scale, we need support at least 20 users to review the report > at same time for each Kylin query node. > So it means each Kylin node should be able to handle 15 * 20 = 300 queries > per second. > > +Performance Report+ > To reduce the network impact. We built up Kylin cluster and testing machine > in the same network with Hadoop system. > We use gatling and Jmeter tools to do several round testing, result as follow. > > |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean > Response Time > (ms)| > |1|773|13|77| > |15|3245|54|279| > |25|3844|64|390| > |50|4912|82|612| > |75|5405|90|841| > |100|5436|91|1108| > |150|5434|91|1688| > > And draw the trend chart as follow: > !TrendChartBeforeFix.png! > > +Conclusion+ > From the trend, when the thread count reach 75, the handled queries per > second reaches peak data 90, and cannot improved by increase the thread count. > Each Kylin query engine can handle 90 queries per second, it means only > support 90/15 = 6 users to review report page at same time. > Even we setup 3 query nodes, can extend to 18 users at same time, this > performance capacity cannot meet our business requirement. > > +Analyze+ > From test result, response for one thread is fast, but as the thread > increase, throughput of Kylin not increased as we expected. > We have full code review for Kylin query engine, and use Jstack and JProfile > to do analyze, found the root cause for this performance bottleneck. > This is one regression bug introduced by new feature involved one year before. > With bug fixing, one Kylin node can handle 350+ queries per second. Submit > this bug for contribute patch to Kylin. > > +Kylin Server Info+ > |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*| > |Query Engine|16 (2.4G)|128|1024| > > +Kylin Package+ > apache-kylin-2.5.0-bin-cdh57.tar.gz (release package) > > +Query SQL+ > SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no > PreparedStatement cache, performance will be more worse). Filter will hit all > 6 segments. > > +Cube Info+ > Segment Number: 6 Total Size: 47 MB > > Segment: 20180101000000_20181011000000 > Start Time: 2018-01-01 00:00:00 > End Time: 2018-10-11 00:00:00 > Source Count: 351934019 > HBase Table: KYLIN_69Q9A850DZ > Region Count: 1 > Size: 47 MB > > Segment: 20181011000000_20181012000000 > Start Time: 2018-10-11 00:00:00 > End Time: 2018-10-12 00:00:00 > Source Count: 7085485 > HBase Table: KYLIN_ZCT39S8FUA > Region Count: 1 > Size: less than 1 MB > > > Segment: 20181012000000_20181013000000 > Start Time: 2018-10-12 00:00:00 > End Time: 2018-10-13 00:00:00 > Source Count: 5534968 > HBase Table: KYLIN_RKRRLA958T > Region Count: 1 > Size: less than 1 MB > > Segment: 20181013000000_20181014000000 > Start Time: 2018-10-13 00:00:00 > End Time: 2018-10-14 00:00:00 > Source Count: 242856 > HBase Table: KYLIN_Q6DKCONN81 > Region Count: 1 > Size: less than 1 MB > > Segment: 20181014000000_20181015000000 > Start Time: 2018-10-14 00:00:00 > End Time: 2018-10-15 00:00:00 > Source Count: 236122 > HBase Table: KYLIN_JY4WQD2MJH > Region Count: 1 > Size: less than 1 MB > > Segment: 20181015000000_20181016000000 > Start Time: 2018-10-15 00:00:00 > End Time: 2018-10-16 00:00:00 > Source Count: 6172353 > HBase Table: KYLIN_E2ELLINV22 > Region Count: 1 > Size: less than 1 MB > > +HBase Region Server+ > Count: 6 > hbase.regionserver.handler.count: 120. > Not blocked in CoProcessor RPC call. -- This message was sent by Atlassian JIRA (v7.6.3#76005)