[ 
https://issues.apache.org/jira/browse/SPARK-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514484#comment-14514484
 ] 

Marcelo Vanzin commented on SPARK-7108:
---------------------------------------

So, just to clarify.

Since 1.0, if you set {{SPARK_LOCAL_DIRS}} in the Worker environment, that will 
propagate to executors and always override the user setting, as you can see in 
Utils.scala (from 1.2.0; also note the separate YARN code path):

{code}
    val confValue = if (isRunningInYarnContainer(conf)) {
      // If we are in yarn mode, systems can have different disk layouts so we 
must set it
      // to what Yarn on this system said was available.
      getYarnLocalDirs(conf)
    } else {
      Option(conf.getenv("SPARK_LOCAL_DIRS")).getOrElse(
        conf.get("spark.local.dir", System.getProperty("java.io.tmpdir")))
    }
{code}

There's a bug in that code, or at the very least a discrepancy with the 
documentation, that allows the user setting to be used if the Worker does not 
have {{SPARK_LOCAL_DIRS}} in its environment. That's the only thing that 
changed in 1.3.0 (Worker now always sets a value for the env variable). Since 
it was not documented, that's why I'm willing to call this "user error". As 
Josh succinctly put it in the bug description, "local directories / disks are a 
property of the cluster and not the application".


> spark.local.dir is no longer honored in Standalone mode
> -------------------------------------------------------
>
>                 Key: SPARK-7108
>                 URL: https://issues.apache.org/jira/browse/SPARK-7108
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.1, 1.3.0
>            Reporter: Josh Rosen
>            Priority: Critical
>
> Prior to SPARK-4834, configuring spark.local.dir in the driver would affect 
> the local directories created on the executor.  After this patch, executors 
> will always ignore this setting in favor of directories read from 
> {{SPARK_LOCAL_DIRS}}, which is set by the standalone worker based on the 
> worker's own configuration and not the application configuration.
> This change impacts users who configured {{spark.local.dir}} only in their 
> driver and not via their cluster's {{spark-defaults.conf}} or 
> {{spark-env.sh}} files.  This is an atypical use-case, since the available 
> local directories / disks are a property of the cluster and not the 
> application, which probably explains why this issue has not been reported 
> previously.
> The correct fix might be comment + documentation improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to