[jira] [Commented] (HIVE-9251) SetSparkReducerParallelism is likely to set too small number of reducers [Spark Branch]

Rui Li (JIRA) Tue, 06 Jan 2015 18:57:53 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267213#comment-14267213
 ]


Rui Li commented on HIVE-9251:
------------------------------

I quickly checked the failed tests. Most of them are in query plan because 
number of reducers changed. Some may also need a SORT_QUERY_RESULT tag. If we 
want to decide number of reducers based on input size and cluster info, maybe 
we shouldn't expose it in the query plan, given that input size may change and 
we currently need some hacks/workarounds to get spark cluster info.
Any ideas?

> SetSparkReducerParallelism is likely to set too small number of reducers 
> [Spark Branch]
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-9251
>                 URL: https://issues.apache.org/jira/browse/HIVE-9251
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-9251.1-spark.patch, HIVE-9251.2-spark.patch
>
>
> This may hurt performance or even lead to task failures. For example, spark's 
> netty-based shuffle limits the max frame size to be 2G.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9251) SetSparkReducerParallelism is likely to set too small number of reducers [Spark Branch]

Reply via email to