[
https://issues.apache.org/jira/browse/HIVE-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539218#comment-14539218
]
Xuefu Zhang commented on HIVE-10671:
------------------------------------
Hi Rui, the user has the following data sizes (from command hadoop fs -du -h
/tpch):
{code}
2.3 G 6.9 G /tpch/customer
74.1 G 222.3 G /tpch/lineitem
2.2 K 6.5 K /tpch/nation
16.6 G 49.7 G /tpch/orders
2.3 G 6.9 G /tpch/part
11.4 G 34.1 G /tpch/partsupp
389 1.1 K /tpch/region
136.3 M 408.8 M /tpch/supplier
{code}
The user's cluster has 27 nodes. User observed 43s for yarn-client vs 162s for
yarn-cluster. Other configurations are the same for both cases.
> yarn-cluster mode offers a degraded performance from yarn-client [Spark
> Branch]
> -------------------------------------------------------------------------------
>
> Key: HIVE-10671
> URL: https://issues.apache.org/jira/browse/HIVE-10671
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Xuefu Zhang
> Assignee: Rui Li
>
> With Hive on Spark, users noticed that in certain cases
> spark.master=yarn-client offers 2x or 3x better performance than if
> spark.master=yarn-cluster. However, yarn-cluster is what we recommend and
> support. Thus, we should investigate and fix the problem. One of the such
> queries is TPC-H 22.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)