[
https://issues.apache.org/jira/browse/TEZ-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091322#comment-14091322
]
Siddharth Seth commented on TEZ-1393:
-------------------------------------
bq. Because if there are concurrent DAGs running in local mode, it will be
problematic since each DAG may create a pb file.
Concurrent executions poses additional problems beyond just the staging
directory and the resources localized into it. Lets address those in the
concurrent execution jira - if that's required. I believe Pig has logic to
effectively queue the DAGs sequentially if concurrent apps are not available
(pools of some kind)
bq. My question is if we hard code the working-directory to be ".", we add
dependency to the user current environment. I mean what if user has background
programs that changing contents under "."? I think, TEZ-1337 can help, I mean
we read from configuration. Also the configuration is useful for us to reduce
dependency on node manager configuration(nm-local.dir) for shuffle.
working-directory, I think is an incorrect name - this is really a directory to
allow local-mode to work without localizing files. If localization were in
place, files would get localized to the working directory - and from there
things should just work.
"." is only used when running on regular clusters. Earlier we used a relative
path which would inherently assume the files to be somewhere in the current
working directory.
In terms of shuffle-handler dependencies - we need to depend on the
NM-LOCAL-DIRS since that's how the ShuffleHandler works. That is not related to
this change though - the local directory continues to be under the staging
directory till TEZ-1337 allows making that configurable.
bq. Siddharth Seth, initially I looked into a ThreadLocalProperties solution
but didn't have the time to test its feasibility. This technique has the added
benefit that user code doesn't need to be written in this same working
directory way and works for multiple LocalClients. If you want I can pursue
this idea to see if it meets our criteria. It is advanced, yes, but not
complicated.
Jon, I'm not sure ThreadLocalProperties would work for changing user.dir.
That's also used when the filesystem is file:///, and effectively decides where
MRInput / MROutput read/write data in local mode. In terms of the
local-directories itself, this is already in place. If we support localization,
we'll have to look at how this needs to be fixed - since that will use the
default flle:/// filesystem as well, but needs to be a directory per container.
I don't think we'll be able to completely support localization though (not
without large changes), since these files can be accessed anywhere in the task
- and that isn't easy to control. An example here is splits being sent via HDFS
- which we'll likely have to special case.
> user.dir should not be reset in LocalMode
> -----------------------------------------
>
> Key: TEZ-1393
> URL: https://issues.apache.org/jira/browse/TEZ-1393
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Priority: Critical
> Attachments: TEZ-1393.1.txt
>
>
> Setting user.dir to the staging-dir has the implication of changing the root
> of the filesystem to the staging directory, which is application specific.
> If using multiple instance of LocalClient within a single JVM - this can get
> problematic, between multiple jobs since the CWD will change in the middle of
> execution - which ends up disallowing usage of relative paths.
--
This message was sent by Atlassian JIRA
(v6.2#6252)