[ 
https://issues.apache.org/jira/browse/HIVE-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860243#comment-15860243
 ] 

Xuefu Zhang commented on HIVE-15796:
------------------------------------

Thanks for working on this, [~csun]! Had a quick pass, and here are some 
thoughts to share:

1. Since we are changing the behavior, I think the default value should retain 
the old behavior.
2. In setSparkReduceParallelism, I think it's probably cleaner to have a 
separate method for the new implementation rather than using several if/else 
blocks.
3. Looking at the test out diff, I'm not sure if the change of number of 
reducers are due to the new implementation. 
4. Wondering if using the default graph walker is absolutely needed or just for 
some benefits of this new implementation.

> HoS: poor reducer parallelism when operator stats are not accurate
> ------------------------------------------------------------------
>
>                 Key: HIVE-15796
>                 URL: https://issues.apache.org/jira/browse/HIVE-15796
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>    Affects Versions: 2.2.0
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>         Attachments: HIVE-15796.1.patch, HIVE-15796.2.patch, 
> HIVE-15796.wip.1.patch, HIVE-15796.wip.2.patch, HIVE-15796.wip.patch
>
>
> In HoS we use currently use operator stats to determine reducer parallelism. 
> However, it is often the case that operator stats are not accurate, 
> especially if column stats are not available. This sometimes will generate 
> extremely poor reducer parallelism, and cause HoS query to run forever. 
> This JIRA tries to offer an alternative way to compute reducer parallelism, 
> similar to how MR does. Here's the approach we are suggesting:
> 1. when computing the parallelism for a MapWork, use stats associated with 
> the TableScan operator;
> 2. when computing the parallelism for a ReduceWork, use the *maximum* 
> parallelism from all its parents.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to