[ 
https://issues.apache.org/jira/browse/HIVE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539218#comment-14539218
 ] 

Xuefu Zhang commented on HIVE-10671:
------------------------------------

Hi Rui, the user has the following data sizes (from command  hadoop fs -du -h 
/tpch):
{code}
2.3 G    6.9 G    /tpch/customer
74.1 G   222.3 G  /tpch/lineitem
2.2 K    6.5 K    /tpch/nation
16.6 G   49.7 G   /tpch/orders
2.3 G    6.9 G    /tpch/part
11.4 G   34.1 G   /tpch/partsupp
389      1.1 K    /tpch/region
136.3 M  408.8 M  /tpch/supplier
{code}
The user's cluster has 27 nodes. User observed 43s for yarn-client vs 162s for 
yarn-cluster. Other configurations are the same for both cases.



> yarn-cluster mode offers a degraded performance from yarn-client [Spark 
> Branch]
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-10671
>                 URL: https://issues.apache.org/jira/browse/HIVE-10671
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Rui Li
>
> With Hive on Spark, users noticed that in certain cases 
> spark.master=yarn-client offers 2x or 3x better performance than if 
> spark.master=yarn-cluster. However, yarn-cluster is what we recommend and 
> support. Thus, we should investigate and fix the problem. One of the such 
> queries is TPC-H  22.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to