[
https://issues.apache.org/jira/browse/SPARK-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Or updated SPARK-8716:
-----------------------------
Description:
More specifically, this is the feature that is currently flagged by
`spark.files.useFetchCache`.
This is a complicated feature that has no tests. I cannot say with confidence
that it actually works on all cluster managers. In particular, I believe it
doesn't work on Mesos because whatever goes into this else case creates its own
temp directory per executor:
https://github.com/apache/spark/blob/881662e9c93893430756320f51cef0fc6643f681/core/src/main/scala/org/apache/spark/util/Utils.scala#L739.
It's also not immediately clear that it works on standalone mode due to the
lack of comments. It actually does work there because the Worker happens to set
a `SPARK_EXECUTOR_DIRS` variable. The linkage could be more explicitly
documented in the code.
This is difficult to write tests for, but it's still important to do so.
Otherwise, semi-related changes in the future may easily break it without
anyone noticing.
Related issues: SPARK-8130, SPARK-6313, SPARK-2713
was:
More specifically, this is the feature that is currently flagged by
`spark.files.useFetchCache`. There are several reasons why we should remove it.
(1) It doesn't even work. Recently, each executor gets its own unique temp
directory for security reasons.
(2) There is no way to fix it. The constraints in (1) are fundamentally opposed
to sharing resources across executors.
(3) It is very complex. The method Utils.fetchFile would be greatly simplified
without this feature that is not even used.
(4) There are no tests for it and it is difficult to test.
Note that we can't just revert the respective patches because they were merged
a long time ago.
Related issues: SPARK-8130, SPARK-6313, SPARK-2713
> Write tests for executor shared cache feature
> ---------------------------------------------
>
> Key: SPARK-8716
> URL: https://issues.apache.org/jira/browse/SPARK-8716
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 1.2.0
> Reporter: Andrew Or
>
> More specifically, this is the feature that is currently flagged by
> `spark.files.useFetchCache`.
> This is a complicated feature that has no tests. I cannot say with confidence
> that it actually works on all cluster managers. In particular, I believe it
> doesn't work on Mesos because whatever goes into this else case creates its
> own temp directory per executor:
> https://github.com/apache/spark/blob/881662e9c93893430756320f51cef0fc6643f681/core/src/main/scala/org/apache/spark/util/Utils.scala#L739.
> It's also not immediately clear that it works on standalone mode due to the
> lack of comments. It actually does work there because the Worker happens to
> set a `SPARK_EXECUTOR_DIRS` variable. The linkage could be more explicitly
> documented in the code.
> This is difficult to write tests for, but it's still important to do so.
> Otherwise, semi-related changes in the future may easily break it without
> anyone noticing.
> Related issues: SPARK-8130, SPARK-6313, SPARK-2713
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]