[
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679449#comment-16679449
]
Zongwei Li commented on KYLIN-3672:
-----------------------------------
[~Shaofengshi] Already merged code with latest code in master and generated
patch file in JIRA, who can help review the file or what else needed to do.
It's my first time to commit patch for Kylin.
[~yimingliu] Let me add the detail analyze from code in this bug
> Performance is poor when multiple queries occur in short period
> ---------------------------------------------------------------
>
> Key: KYLIN-3672
> URL: https://issues.apache.org/jira/browse/KYLIN-3672
> Project: Kylin
> Issue Type: Bug
> Components: Query Engine
> Affects Versions: v2.5.0
> Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
> Reporter: Zongwei Li
> Assignee: Zongwei Li
> Priority: Critical
> Labels: patch, performance
> Attachments: KYLIN-3672.master.001.patch, TrendChartBeforeFix.png
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI
> report integrate with Kylin.
>
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries
> per second.
>
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean
> Response Time
> (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this
> performance capacity cannot meet our business requirement.
>
> +Analyze+
> From test result, response for one thread is fast, but as the thread
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit
> this bug for contribute patch to Kylin.
>
> +Kylin Server Info+
> |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*|
> |Query Engine|16 (2.4G)|128|1024|
>
> +Kylin Package+
> apache-kylin-2.5.0-bin-cdh57.tar.gz (release package)
>
> +Query SQL+
> SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no
> PreparedStatement cache, performance will be more worse). Filter will hit all
> 6 segments.
>
> +Cube Info+
> Segment Number: 6 Total Size: 47 MB
>
> Segment: 20180101000000_20181011000000
> Start Time: 2018-01-01 00:00:00
> End Time: 2018-10-11 00:00:00
> Source Count: 351934019
> HBase Table: KYLIN_69Q9A850DZ
> Region Count: 1
> Size: 47 MB
>
> Segment: 20181011000000_20181012000000
> Start Time: 2018-10-11 00:00:00
> End Time: 2018-10-12 00:00:00
> Source Count: 7085485
> HBase Table: KYLIN_ZCT39S8FUA
> Region Count: 1
> Size: less than 1 MB
>
>
> Segment: 20181012000000_20181013000000
> Start Time: 2018-10-12 00:00:00
> End Time: 2018-10-13 00:00:00
> Source Count: 5534968
> HBase Table: KYLIN_RKRRLA958T
> Region Count: 1
> Size: less than 1 MB
>
> Segment: 20181013000000_20181014000000
> Start Time: 2018-10-13 00:00:00
> End Time: 2018-10-14 00:00:00
> Source Count: 242856
> HBase Table: KYLIN_Q6DKCONN81
> Region Count: 1
> Size: less than 1 MB
>
> Segment: 20181014000000_20181015000000
> Start Time: 2018-10-14 00:00:00
> End Time: 2018-10-15 00:00:00
> Source Count: 236122
> HBase Table: KYLIN_JY4WQD2MJH
> Region Count: 1
> Size: less than 1 MB
>
> Segment: 20181015000000_20181016000000
> Start Time: 2018-10-15 00:00:00
> End Time: 2018-10-16 00:00:00
> Source Count: 6172353
> HBase Table: KYLIN_E2ELLINV22
> Region Count: 1
> Size: less than 1 MB
>
> +HBase Region Server+
> Count: 6
> hbase.regionserver.handler.count: 120.
> Not blocked in CoProcessor RPC call.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)