[ 
https://issues.apache.org/jira/browse/HIVE-22975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051080#comment-17051080
 ] 

Rajesh Balamohan commented on HIVE-22975:
-----------------------------------------

With patch, CPU usage for {{TopNKeyFilter.canForward}} went down from {{15% -> 
8%}}

With Q43 in my cluster,

DAG runtime with patch:
 R1: 88.34
 R2: 88.55
 R3: 88.07
 R4: 88.01
 R5: 87.96

DAG runtime without patch:
 R1: 95.46
 R2: 95.83
 R3: 96.77
 R4: 96.24
 R5: 95.13

~70% of the comparisons are via boundary checks now. This can be observed from 
the logs (i.e eff & total counters below)

 
{noformat}
<14>1 2020-03-04T10:14:09.776Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_220_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@a336438, 
topN=100, repeated=4240631, added=200, total=14045075, eff=9804244, 
forwardingRatio=0.30194435}
<14>1 2020-03-04T10:14:10.906Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_140_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@1d1a5be7, 
topN=100, repeated=4925112, added=234, total=17266776, eff=12341430, 
forwardingRatio=0.2852499}
<14>1 2020-03-04T10:14:11.623Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_210_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@80d07c2, 
topN=100, repeated=5278609, added=221, total=16243904, eff=10965074, 
forwardingRatio=0.324973}
<14>1 2020-03-04T10:14:12.324Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_180_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@22401517, 
topN=100, repeated=5481458, added=222, total=17540574, eff=12058894, 
forwardingRatio=0.31251428}
<14>1 2020-03-04T10:14:29.713Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_240_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@68c1743b, 
topN=100, repeated=3730824, added=207, total=12332974, eff=8601943, 
forwardingRatio=0.30252483}
<14>1 2020-03-04T10:14:31.570Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_250_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@6b5cce3a, 
topN=100, repeated=3506980, added=225, total=12343254, eff=8836049, 
forwardingRatio=0.28413942}
<14>1 2020-03-04T10:14:32.723Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_300_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@3213767, 
topN=100, repeated=3707044, added=214, total=11707241, eff=7999983, 
forwardingRatio=0.31666368}
<14>1 2020-03-04T10:14:32.942Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_290_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@688ca605, 
topN=100, repeated=3794086, added=216, total=12343961, eff=8549659, 
forwardingRatio=0.30738124}
<14>1 2020-03-04T10:14:33.895Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_270_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@66031ad4, 
topN=100, repeated=3929489, added=214, total=13527930, eff=9598227, 
forwardingRatio=0.29048812}
<14>1 2020-03-04T10:14:34.534Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_280_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@11b29a09, 
topN=100, repeated=4104207, added=212, total=13709811, eff=9605392, 
forwardingRatio=0.29937825}
<14>1 2020-03-04T10:14:34.734Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_260_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@7926ef7, 
topN=100, repeated=3946203, added=209, total=13548849, eff=9602437, 
forwardingRatio=0.29127285}
<14>1 2020-03-04T10:14:34.859Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_340_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@3cf9ac12, 
topN=100, repeated=3514641, added=219, total=11547405, eff=8032545, 
forwardingRatio=0.30438527}
<14>1 2020-03-04T10:14:35.559Z query-executor-0-0 query-executor 1 
6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 
class="vector.VectorTopNKeyOperator" level="INFO" 
thread="TezTR-731251_9_1_2_310_0"] Closing TopNKeyFilter: 
TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@2ca1ee62, 
topN=100, repeated=3952264, added=220, total=13634332, eff=9681848, 
forwardingRatio=0.28989202}
{noformat}

> Optimise TopNKeyFilter with boundary checks
> -------------------------------------------
>
>                 Key: HIVE-22975
>                 URL: https://issues.apache.org/jira/browse/HIVE-22975
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HIVE-22975.1.patch, Screenshot 2020-03-04 at 3.26.45 
> PM.jpg
>
>
> !Screenshot 2020-03-04 at 3.26.45 PM.jpg|width=507,height=322!
>  
> It would be good to add boundary checks to reduce cycles spent on topN 
> filter. E.g Q43 spends good amount of time in topN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to