[ 
https://issues.apache.org/jira/browse/HIVE-15682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863137#comment-15863137
 ] 

Xuefu Zhang commented on HIVE-15682:
------------------------------------

Hi [~dapengsun], Thank you very much for running the benchmarks. Overall the 
numbers look good, which show that the performance penalty is insignificant 
except for a few queries. There are two observations:

1. HIVE-15682 is supposed to at least offer equivalent performance, while the 
result shows that it actually has some performance degradations on quite a few 
queries. I'm not sure if this is due to cluster perf variations.

2. HIVE-15580 has some performance degradation on queries having group by. I'm 
sure if this has shown up in your query result.

While you're still there, could you please run similar queries that I used in 
this JIRA and in HIVE-15683 to confirm my measurement?

Thanks!

> Eliminate per-row based dummy iterator creation
> -----------------------------------------------
>
>                 Key: HIVE-15682
>                 URL: https://issues.apache.org/jira/browse/HIVE-15682
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>    Affects Versions: 2.2.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15682.patch
>
>
> HIVE-15580 introduced a dummy iterator per input row which can be eliminated. 
> This is because {{SparkReduceRecordHandler}} is able to handle single key 
> value pairs. We can refactor this part of code 1. to remove the need for a 
> iterator and 2. to optimize the code path for per (key, value) based (instead 
> of (key, value iterator)) processing. It would be also great if we can 
> measure the performance after the optimizations and compare to performance 
> prior to HIVE-15580.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to