[ 
https://issues.apache.org/jira/browse/TEZ-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093487#comment-14093487
 ] 

Siddharth Seth commented on TEZ-1393:
-------------------------------------

[~airbots] Adding a few notes on the different directories involved. I think my 
previous comment had some details on this as well:

local-dirs -> Normally, setup by YARN. This is where intermediate Inputs / 
Outputs generate their data. Cleaned up by YARN after job completion. Is 
explicitly referenced, so is OK to change. This needs to be app specific rather 
than container specific.
PWD: working-directory for running JVMs / AM. This is where resources will be 
localized, and this directory is always in the classpath.
StagingDir: directory where the client writes out files so that they can be 
localized into the distributed cache. In local mode, since we don't localize 
files - we end up reading these resources directly from this path.

user.dir - This is a system property. When using LocalFileSystem - this is the 
default working directory for the FileSystem instance. DataSources and data 
sinks which use FileSystem, will just use the FileSystem isntance - 
irrespective of whether it's local, hdfs, s3 etc. They won't attempt to change 
any directories. This is where setting this system property to be container 
specific does not work - since final output will be spread across directories. 
Also setting it within the client is problematic, since users will typically 
generate data into a relative path - which will not be accessible if this 
parameter is updated in LocalMode. 

bq. I am confused why we allow user to set staging dir (like in 
OrderedWordCount). From the system point of view, staging dir is the directory 
that system prepairs the user's job after user submission. It should be the 
same for consistency.
The users can provide a staging directory in case they want Tez files to be 
generated within that directory, instead of a standard location that would 
otherwise be configured in tez-site.xml.

> user.dir should not be reset in LocalMode
> -----------------------------------------
>
>                 Key: TEZ-1393
>                 URL: https://issues.apache.org/jira/browse/TEZ-1393
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>            Priority: Blocker
>         Attachments: TEZ-1393.1.txt, TEZ-1393.2.txt
>
>
> Setting user.dir to the staging-dir has the implication of changing the root 
> of the filesystem to the staging directory, which is application specific.
> If using multiple instance of LocalClient within a single JVM - this can get 
> problematic, between multiple jobs since the CWD will change in the middle of 
> execution - which ends up disallowing usage of relative paths.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to