Josh Rosen created SPARK-4896:
---------------------------------
Summary: Don't redundantly copy executor dependencies in
Utils.fetchFile
Key: SPARK-4896
URL: https://issues.apache.org/jira/browse/SPARK-4896
Project: Spark
Issue Type: Improvement
Reporter: Josh Rosen
This JIRA is spun off from a comment by [~rdub] on SPARK-3967, quoted here:
{quote}
I've been debugging this issue as well and I think I've found an issue in
{{org.apache.spark.util.Utils}} that is contributing to / causing the problem:
{{Files.move}} on [line
390|https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/util/Utils.scala#L390]
is called even if {{targetFile}} exists and {{tempFile}} and {{targetFile}}
are equal.
The check on [line
379|https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/util/Utils.scala#L379]
seems to imply the desire to skip a redundant overwrite if the file is already
there and has the contents that it should have.
Gating the {{Files.move}} call on a further {{if (!targetFile.exists)}} fixes
the issue for me; attached is a patch of the change.
In practice all of my executors that hit this code path are finding every
dependency JAR to already exist and be exactly equal to what they need it to
be, meaning they were all needlessly overwriting all of their dependency JARs,
and now are all basically no-op-ing in {{Utils.fetchFile}}; I've not determined
who/what is putting the JARs there, why the issue only crops up in
{{yarn-cluster}} mode (or {{--master yarn --deploy-mode cluster}}), etc., but
it seems like either way this patch is probably desirable.
{quote}
I'm spinning this off into its own JIRA so that we can track the merging of
https://github.com/apache/spark/pull/2848 separately (since we have multiple
PRs that contribute to fixing the original issue).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]