[
https://issues.apache.org/jira/browse/SPARK-26082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun resolved SPARK-26082.
-----------------------------------
Resolution: Fixed
Fix Version/s: 3.0.0
2.4.1
2.3.4
This is resolved via https://github.com/apache/spark/pull/23734
> Misnaming of spark.mesos.fetch(er)Cache.enable in MesosClusterScheduler
> -----------------------------------------------------------------------
>
> Key: SPARK-26082
> URL: https://issues.apache.org/jira/browse/SPARK-26082
> Project: Spark
> Issue Type: Bug
> Components: Mesos
> Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.0,
> 2.3.1, 2.3.2
> Reporter: Martin Loncaric
> Priority: Major
> Fix For: 2.3.4, 2.4.1, 3.0.0
>
>
> Currently in
> [docs|https://spark.apache.org/docs/latest/running-on-mesos.html]:
> {quote}spark.mesos.fetcherCache.enable / false / If set to `true`, all URIs
> (example: `spark.executor.uri`, `spark.mesos.uris`) will be cached by the
> Mesos Fetcher Cache
> {quote}
> Currently in {{MesosClusterScheduler.scala}} (which passes parameter to
> driver):
> {{private val useFetchCache =
> conf.getBoolean("spark.mesos.fetchCache.enable", false)}}
> Currently in {{MesosCourseGrainedSchedulerBackend.scala}} (which passes mesos
> caching parameter to executors):
> {{private val useFetcherCache =
> conf.getBoolean("spark.mesos.fetcherCache.enable", false)}}
> This naming discrepancy dates back to version 2.0.0
> ([jira|http://mail-archives.apache.org/mod_mbox/spark-issues/201606.mbox/%[email protected]%3E]).
> This means that when {{spark.mesos.fetcherCache.enable=true}} is specified,
> the Mesos cache will be used only for executors, and not for drivers.
> IMPACT:
> Not caching these driver files (typically including at least spark binaries,
> custom jar, and additional dependencies) adds considerable overhead network
> traffic and startup time when frequently running spark Applications on a
> Mesos cluster. Additionally, since extracted files like
> {{spark-x.x.x-bin-*.tgz}} are additionally copied and left in the sandbox
> with the cache off (rather than extracted directly without an extra copy),
> this can considerably increase disk usage. Users CAN currently workaround by
> specifying the {{spark.mesos.fetchCache.enable}} option, but this should at
> least be specified in the documentation.
> SUGGESTED FIX:
> Add {{spark.mesos.fetchCache.enable}} to the documentation for versions 2 -
> 2.4, and update {{MesosClusterScheduler.scala}} to use
> {{spark.mesos.fetcherCache.enable}} going forward (literally a one-line
> change).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]