[
https://issues.apache.org/jira/browse/HIVE-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174341#comment-14174341
]
Chao commented on HIVE-8486:
----------------------------
OK, I debugged this query. In {{SetSparkReducerParallelism}}, in order to
estimate, it needs to obtain statistics from the siblings of the current reduce
sink, and adds up the total number of bytes. However, somehow the
{{statistics}} field of all the siblings are null, and hence the number of
bytes is 0 at end. As result, it will only use one reducer.
I'm wondering if this is something we haven't implemented yet, or is it a bug?
> TPC-DS Query 96 parallelism is not set correcly
> -----------------------------------------------
>
> Key: HIVE-8486
> URL: https://issues.apache.org/jira/browse/HIVE-8486
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Brock Noland
> Assignee: Chao
>
> When we run the query on a 20B we only have a parallelism factor of 1.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)