[
https://issues.apache.org/jira/browse/SPARK-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552756#comment-14552756
]
Matt Cheah commented on SPARK-7108:
-----------------------------------
Just wanted to add my two cents here. I've had several cases where the way
SPARK_LOCAL_DIRS and spark.local.dir interact have caused problems.
The issue I'm seeing right now is that SPARK_LOCAL_DIRS is set on the Worker
daemon, which makes it so that one cannot set local dirs per application. This
setting is passed down to the executor (see ExecutorRunner.scala) and overrides
the setting given in the driver's SparkConf. So all applications need to share
the same directory specified by SPARK_LOCAL_DIRS.
The problem is that I want different applications to have different local
directories. This boils down to an issue around cleanup. Spark does not
automatically clean up shuffle files when an application completes (perhaps
that's the root difficulty?). So I can use a cron job or post-application hook
to clean up the local dir... except if another Spark application is also
running and using that same directory it will break. This would not be an issue
if applications did not have to share the same location for SPARK_LOCAL_DIRS so
a local dir can be cleaned up without clobbering other applications.
Is this an issue around Spark not automatically cleaning up shuffle files? Or
is it a valid case where different applications should have a different local
dir to dump shuffle files? Either way, I'd like to hear feedback on this.
> spark.local.dir is no longer honored in Standalone mode
> -------------------------------------------------------
>
> Key: SPARK-7108
> URL: https://issues.apache.org/jira/browse/SPARK-7108
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.2.1, 1.3.0
> Reporter: Josh Rosen
> Priority: Critical
>
> Prior to SPARK-4834, configuring spark.local.dir in the driver would affect
> the local directories created on the executor. After this patch, executors
> will always ignore this setting in favor of directories read from
> {{SPARK_LOCAL_DIRS}}, which is set by the standalone worker based on the
> worker's own configuration and not the application configuration.
> This change impacts users who configured {{spark.local.dir}} only in their
> driver and not via their cluster's {{spark-defaults.conf}} or
> {{spark-env.sh}} files. This is an atypical use-case, since the available
> local directories / disks are a property of the cluster and not the
> application, which probably explains why this issue has not been reported
> previously.
> The correct fix might be comment + documentation improvements.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]