[ 
https://issues.apache.org/jira/browse/TEZ-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091322#comment-14091322
 ] 

Siddharth Seth commented on TEZ-1393:
-------------------------------------

bq. Because if there are concurrent DAGs running in local mode, it will be 
problematic since each DAG may create a pb file.
Concurrent executions poses additional problems beyond just the staging 
directory and the resources localized into it. Lets address those in the 
concurrent execution jira - if that's required. I believe Pig has logic to 
effectively queue the DAGs sequentially if concurrent apps are not available 
(pools of some kind)

bq. My question is if we hard code the working-directory to be ".", we add 
dependency to the user current environment. I mean what if user has background 
programs that changing contents under "."? I think, TEZ-1337 can help, I mean 
we read from configuration. Also the configuration is useful for us to reduce 
dependency on node manager configuration(nm-local.dir) for shuffle.
working-directory, I think is an incorrect name - this is really a directory to 
allow local-mode to work without localizing files. If localization were in 
place, files would get localized to the working directory - and from there 
things should just work.
"." is only used when running on regular clusters. Earlier we used a relative 
path which would inherently assume the files to be somewhere in the current 
working directory.
In terms of shuffle-handler dependencies - we need to depend on the 
NM-LOCAL-DIRS since that's how the ShuffleHandler works. That is not related to 
this change though - the local directory continues to be under the staging 
directory till TEZ-1337 allows making that configurable.

bq. Siddharth Seth, initially I looked into a ThreadLocalProperties solution 
but didn't have the time to test its feasibility. This technique has the added 
benefit that user code doesn't need to be written in this same working 
directory way and works for multiple LocalClients. If you want I can pursue 
this idea to see if it meets our criteria. It is advanced, yes, but not 
complicated.
Jon, I'm not sure ThreadLocalProperties would work for changing user.dir. 
That's also used when the filesystem is file:///, and effectively decides where 
MRInput / MROutput read/write data in local mode. In terms of the 
local-directories itself, this is already in place. If we support localization, 
we'll have to look at how this needs to be fixed - since that will use the 
default flle:/// filesystem as well, but needs to be a directory per container. 
I don't think we'll be able to completely support localization though (not 
without large changes), since these files can be accessed anywhere in the task 
- and that isn't easy to control. An example here is splits being sent via HDFS 
- which we'll likely have to special case.

> user.dir should not be reset in LocalMode
> -----------------------------------------
>
>                 Key: TEZ-1393
>                 URL: https://issues.apache.org/jira/browse/TEZ-1393
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>            Priority: Critical
>         Attachments: TEZ-1393.1.txt
>
>
> Setting user.dir to the staging-dir has the implication of changing the root 
> of the filesystem to the staging directory, which is application specific.
> If using multiple instance of LocalClient within a single JVM - this can get 
> problematic, between multiple jobs since the CWD will change in the middle of 
> execution - which ends up disallowing usage of relative paths.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to