[ 
https://issues.apache.org/jira/browse/KYLIN-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679387#comment-16679387
 ] 

Billy Liu commented on KYLIN-3672:
----------------------------------

Impressive. [~zonli] Could you share more info about the root cause? 

> Performance is poor when multiple queries occur in short period
> ---------------------------------------------------------------
>
>                 Key: KYLIN-3672
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3672
>             Project: Kylin
>          Issue Type: Bug
>          Components: Query Engine
>    Affects Versions: v2.5.0
>         Environment: CentOS 6.7, HBase 1.2.0+cdh5.14.2+456
>            Reporter: Zongwei Li
>            Assignee: Zongwei Li
>            Priority: Critical
>              Labels: patch, performance
>         Attachments: TrendChartBeforeFix.png
>
>
> Hi, Kylin Team
> We found one Kylin performance bug during performance tuning for our BI 
> report integrate with Kylin.
>  
> +Background+
> Our BI report show customer usage report to enterprise customers, provide 15 
> usage charts in report page.
> Each chart need send API request to Kylin with different SQLs. So it means 
> for one user, it will trigger 15 API calls(by JDBC) to Kylin.
> For our product scale, we need support at least 20 users to review the report 
> at same time for each Kylin query node.
> So it means each Kylin node should be able to handle 15 * 20 = 300 queries  
> per second.
>  
> +Performance Report+
> To reduce the network impact. We built up Kylin cluster and testing machine 
> in the same network with Hadoop system.
> We use gatling and Jmeter tools to do several round testing, result as follow.
>  
> |Thread|Handled Queries (in 60 seconds)|Handled Queries (per second)|Mean 
> Response Time
> (ms)|
> |1|773|13|77|
> |15|3245|54|279|
> |25|3844|64|390|
> |50|4912|82|612|
> |75|5405|90|841|
> |100|5436|91|1108|
> |150|5434|91|1688|
>  
> And draw the trend chart as follow:
> !TrendChartBeforeFix.png!
>  
> +Conclusion+
> From the trend, when the thread count reach 75, the handled queries per 
> second reaches peak data 90, and cannot improved by increase the thread count.
> Each Kylin query engine can handle 90 queries per second, it means only 
> support 90/15 = 6 users to review report page at same time.
> Even we setup 3 query nodes, can extend to 18 users at same time, this 
> performance capacity cannot meet our business requirement.
>  
> +Analyze+
> From test result, response for one thread is fast, but as the thread 
> increase, throughput of Kylin not increased as we expected.
> We have full code review for Kylin query engine, and use Jstack and JProfile 
> to do analyze, found the root cause for this performance bottleneck.
> This is one regression bug introduced by new feature involved one year before.
> With bug fixing, one Kylin node can handle 350+ queries per second. Submit 
> this bug for contribute patch to Kylin.
>  
> +Kylin Server Info+
> |*Role*|*vCPU*|*Memory(GB)*|*Volume(GB)*|
> |Query Engine|16 (2.4G)|128|1024|
>  
> +Kylin Package+
> apache-kylin-2.5.0-bin-cdh57.tar.gz (release package)
>  
> +Query SQL+
> SQL with PreparedStatement cache enabled. (New feature in Kylin 2.5.0. If no 
> PreparedStatement cache, performance will be more worse). Filter will hit all 
> 6 segments.
>  
> +Cube Info+
> Segment Number: 6 Total Size: 47 MB
>  
> Segment: 20180101000000_20181011000000
> Start Time: 2018-01-01 00:00:00
> End Time: 2018-10-11 00:00:00
> Source Count: 351934019
> HBase Table: KYLIN_69Q9A850DZ
> Region Count: 1
> Size: 47 MB
>  
> Segment: 20181011000000_20181012000000
> Start Time: 2018-10-11 00:00:00
> End Time: 2018-10-12 00:00:00
> Source Count: 7085485
> HBase Table: KYLIN_ZCT39S8FUA
> Region Count: 1
> Size: less than 1 MB
>  
>  
> Segment: 20181012000000_20181013000000
> Start Time: 2018-10-12 00:00:00
> End Time: 2018-10-13 00:00:00
> Source Count: 5534968
> HBase Table: KYLIN_RKRRLA958T
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 20181013000000_20181014000000
> Start Time: 2018-10-13 00:00:00
> End Time: 2018-10-14 00:00:00
> Source Count: 242856
> HBase Table: KYLIN_Q6DKCONN81
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 20181014000000_20181015000000
> Start Time: 2018-10-14 00:00:00
> End Time: 2018-10-15 00:00:00
> Source Count: 236122
> HBase Table: KYLIN_JY4WQD2MJH
> Region Count: 1
> Size: less than 1 MB
>  
> Segment: 20181015000000_20181016000000
> Start Time: 2018-10-15 00:00:00
> End Time: 2018-10-16 00:00:00
> Source Count: 6172353
> HBase Table: KYLIN_E2ELLINV22
> Region Count: 1
> Size: less than 1 MB
>  
> +HBase Region Server+ 
> Count: 6
> hbase.regionserver.handler.count: 120.
> Not blocked in CoProcessor RPC call.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to