[
https://issues.apache.org/jira/browse/TEZ-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093487#comment-14093487
]
Siddharth Seth commented on TEZ-1393:
-------------------------------------
[~airbots] Adding a few notes on the different directories involved. I think my
previous comment had some details on this as well:
local-dirs -> Normally, setup by YARN. This is where intermediate Inputs /
Outputs generate their data. Cleaned up by YARN after job completion. Is
explicitly referenced, so is OK to change. This needs to be app specific rather
than container specific.
PWD: working-directory for running JVMs / AM. This is where resources will be
localized, and this directory is always in the classpath.
StagingDir: directory where the client writes out files so that they can be
localized into the distributed cache. In local mode, since we don't localize
files - we end up reading these resources directly from this path.
user.dir - This is a system property. When using LocalFileSystem - this is the
default working directory for the FileSystem instance. DataSources and data
sinks which use FileSystem, will just use the FileSystem isntance -
irrespective of whether it's local, hdfs, s3 etc. They won't attempt to change
any directories. This is where setting this system property to be container
specific does not work - since final output will be spread across directories.
Also setting it within the client is problematic, since users will typically
generate data into a relative path - which will not be accessible if this
parameter is updated in LocalMode.
bq. I am confused why we allow user to set staging dir (like in
OrderedWordCount). From the system point of view, staging dir is the directory
that system prepairs the user's job after user submission. It should be the
same for consistency.
The users can provide a staging directory in case they want Tez files to be
generated within that directory, instead of a standard location that would
otherwise be configured in tez-site.xml.
> user.dir should not be reset in LocalMode
> -----------------------------------------
>
> Key: TEZ-1393
> URL: https://issues.apache.org/jira/browse/TEZ-1393
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Priority: Blocker
> Attachments: TEZ-1393.1.txt, TEZ-1393.2.txt
>
>
> Setting user.dir to the staging-dir has the implication of changing the root
> of the filesystem to the staging directory, which is application specific.
> If using multiple instance of LocalClient within a single JVM - this can get
> problematic, between multiple jobs since the CWD will change in the middle of
> execution - which ends up disallowing usage of relative paths.
--
This message was sent by Atlassian JIRA
(v6.2#6252)