[
https://issues.apache.org/jira/browse/SPARK-21714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marcelo Vanzin updated SPARK-21714:
-----------------------------------
Fix Version/s: 2.2.1
> SparkSubmit in Yarn Client mode downloads remote files and then reuploads
> them again
> ------------------------------------------------------------------------------------
>
> Key: SPARK-21714
> URL: https://issues.apache.org/jira/browse/SPARK-21714
> Project: Spark
> Issue Type: Bug
> Components: Spark Submit
> Affects Versions: 2.2.0
> Reporter: Thomas Graves
> Assignee: Saisai Shao
> Priority: Critical
> Fix For: 2.2.1, 2.3.0
>
>
> SPARK-10643 added the ability for spark-submit to download remote file in
> client mode.
> However in yarn mode this introduced a bug where it downloads them for the
> client but then yarn client just reuploads them to HDFS and uses them again.
> This should not happen when the remote file is HDFS. This is wasting
> resources and its defeating the distributed cache because if the original
> object was public it would have been shared by many users. By us downloading
> and reuploading, it becomes private.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]