[ https://issues.apache.org/jira/browse/HIVE-22975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051080#comment-17051080 ]
Rajesh Balamohan commented on HIVE-22975: ----------------------------------------- With patch, CPU usage for {{TopNKeyFilter.canForward}} went down from {{15% -> 8%}} With Q43 in my cluster, DAG runtime with patch: R1: 88.34 R2: 88.55 R3: 88.07 R4: 88.01 R5: 87.96 DAG runtime without patch: R1: 95.46 R2: 95.83 R3: 96.77 R4: 96.24 R5: 95.13 ~70% of the comparisons are via boundary checks now. This can be observed from the logs (i.e eff & total counters below) {noformat} <14>1 2020-03-04T10:14:09.776Z query-executor-0-0 query-executor 1 6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 class="vector.VectorTopNKeyOperator" level="INFO" thread="TezTR-731251_9_1_2_220_0"] Closing TopNKeyFilter: TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@a336438, topN=100, repeated=4240631, added=200, total=14045075, eff=9804244, forwardingRatio=0.30194435} <14>1 2020-03-04T10:14:10.906Z query-executor-0-0 query-executor 1 6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 class="vector.VectorTopNKeyOperator" level="INFO" thread="TezTR-731251_9_1_2_140_0"] Closing TopNKeyFilter: TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@1d1a5be7, topN=100, repeated=4925112, added=234, total=17266776, eff=12341430, forwardingRatio=0.2852499} <14>1 2020-03-04T10:14:11.623Z query-executor-0-0 query-executor 1 6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 class="vector.VectorTopNKeyOperator" level="INFO" thread="TezTR-731251_9_1_2_210_0"] Closing TopNKeyFilter: TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@80d07c2, topN=100, repeated=5278609, added=221, total=16243904, eff=10965074, forwardingRatio=0.324973} <14>1 2020-03-04T10:14:12.324Z query-executor-0-0 query-executor 1 6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 class="vector.VectorTopNKeyOperator" level="INFO" thread="TezTR-731251_9_1_2_180_0"] Closing TopNKeyFilter: TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@22401517, topN=100, repeated=5481458, added=222, total=17540574, eff=12058894, forwardingRatio=0.31251428} <14>1 2020-03-04T10:14:29.713Z query-executor-0-0 query-executor 1 6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 class="vector.VectorTopNKeyOperator" level="INFO" thread="TezTR-731251_9_1_2_240_0"] Closing TopNKeyFilter: TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@68c1743b, topN=100, repeated=3730824, added=207, total=12332974, eff=8601943, forwardingRatio=0.30252483} <14>1 2020-03-04T10:14:31.570Z query-executor-0-0 query-executor 1 6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 class="vector.VectorTopNKeyOperator" level="INFO" thread="TezTR-731251_9_1_2_250_0"] Closing TopNKeyFilter: TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@6b5cce3a, topN=100, repeated=3506980, added=225, total=12343254, eff=8836049, forwardingRatio=0.28413942} <14>1 2020-03-04T10:14:32.723Z query-executor-0-0 query-executor 1 6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 class="vector.VectorTopNKeyOperator" level="INFO" thread="TezTR-731251_9_1_2_300_0"] Closing TopNKeyFilter: TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@3213767, topN=100, repeated=3707044, added=214, total=11707241, eff=7999983, forwardingRatio=0.31666368} <14>1 2020-03-04T10:14:32.942Z query-executor-0-0 query-executor 1 6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 class="vector.VectorTopNKeyOperator" level="INFO" thread="TezTR-731251_9_1_2_290_0"] Closing TopNKeyFilter: TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@688ca605, topN=100, repeated=3794086, added=216, total=12343961, eff=8549659, forwardingRatio=0.30738124} <14>1 2020-03-04T10:14:33.895Z query-executor-0-0 query-executor 1 6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 class="vector.VectorTopNKeyOperator" level="INFO" thread="TezTR-731251_9_1_2_270_0"] Closing TopNKeyFilter: TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@66031ad4, topN=100, repeated=3929489, added=214, total=13527930, eff=9598227, forwardingRatio=0.29048812} <14>1 2020-03-04T10:14:34.534Z query-executor-0-0 query-executor 1 6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 class="vector.VectorTopNKeyOperator" level="INFO" thread="TezTR-731251_9_1_2_280_0"] Closing TopNKeyFilter: TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@11b29a09, topN=100, repeated=4104207, added=212, total=13709811, eff=9605392, forwardingRatio=0.29937825} <14>1 2020-03-04T10:14:34.734Z query-executor-0-0 query-executor 1 6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 class="vector.VectorTopNKeyOperator" level="INFO" thread="TezTR-731251_9_1_2_260_0"] Closing TopNKeyFilter: TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@7926ef7, topN=100, repeated=3946203, added=209, total=13548849, eff=9602437, forwardingRatio=0.29127285} <14>1 2020-03-04T10:14:34.859Z query-executor-0-0 query-executor 1 6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 class="vector.VectorTopNKeyOperator" level="INFO" thread="TezTR-731251_9_1_2_340_0"] Closing TopNKeyFilter: TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@3cf9ac12, topN=100, repeated=3514641, added=219, total=11547405, eff=8032545, forwardingRatio=0.30438527} <14>1 2020-03-04T10:14:35.559Z query-executor-0-0 query-executor 1 6f75c674-5e00-11ea-8e11-06a655adac02 [mdc@18060 class="vector.VectorTopNKeyOperator" level="INFO" thread="TezTR-731251_9_1_2_310_0"] Closing TopNKeyFilter: TopNKeyFilter{id=org.apache.hadoop.hive.ql.exec.TopNKeyFilter@2ca1ee62, topN=100, repeated=3952264, added=220, total=13634332, eff=9681848, forwardingRatio=0.28989202} {noformat} > Optimise TopNKeyFilter with boundary checks > ------------------------------------------- > > Key: HIVE-22975 > URL: https://issues.apache.org/jira/browse/HIVE-22975 > Project: Hive > Issue Type: Improvement > Components: Hive > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Priority: Minor > Attachments: HIVE-22975.1.patch, Screenshot 2020-03-04 at 3.26.45 > PM.jpg > > > !Screenshot 2020-03-04 at 3.26.45 PM.jpg|width=507,height=322! > > It would be good to add boundary checks to reduce cycles spent on topN > filter. E.g Q43 spends good amount of time in topN. -- This message was sent by Atlassian Jira (v8.3.4#803005)