[
https://issues.apache.org/jira/browse/SPARK-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514484#comment-14514484
]
Marcelo Vanzin commented on SPARK-7108:
---------------------------------------
So, just to clarify.
Since 1.0, if you set {{SPARK_LOCAL_DIRS}} in the Worker environment, that will
propagate to executors and always override the user setting, as you can see in
Utils.scala (from 1.2.0; also note the separate YARN code path):
{code}
val confValue = if (isRunningInYarnContainer(conf)) {
// If we are in yarn mode, systems can have different disk layouts so we
must set it
// to what Yarn on this system said was available.
getYarnLocalDirs(conf)
} else {
Option(conf.getenv("SPARK_LOCAL_DIRS")).getOrElse(
conf.get("spark.local.dir", System.getProperty("java.io.tmpdir")))
}
{code}
There's a bug in that code, or at the very least a discrepancy with the
documentation, that allows the user setting to be used if the Worker does not
have {{SPARK_LOCAL_DIRS}} in its environment. That's the only thing that
changed in 1.3.0 (Worker now always sets a value for the env variable). Since
it was not documented, that's why I'm willing to call this "user error". As
Josh succinctly put it in the bug description, "local directories / disks are a
property of the cluster and not the application".
> spark.local.dir is no longer honored in Standalone mode
> -------------------------------------------------------
>
> Key: SPARK-7108
> URL: https://issues.apache.org/jira/browse/SPARK-7108
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.2.1, 1.3.0
> Reporter: Josh Rosen
> Priority: Critical
>
> Prior to SPARK-4834, configuring spark.local.dir in the driver would affect
> the local directories created on the executor. After this patch, executors
> will always ignore this setting in favor of directories read from
> {{SPARK_LOCAL_DIRS}}, which is set by the standalone worker based on the
> worker's own configuration and not the application configuration.
> This change impacts users who configured {{spark.local.dir}} only in their
> driver and not via their cluster's {{spark-defaults.conf}} or
> {{spark-env.sh}} files. This is an atypical use-case, since the available
> local directories / disks are a property of the cluster and not the
> application, which probably explains why this issue has not been reported
> previously.
> The correct fix might be comment + documentation improvements.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]