[ 
https://issues.apache.org/jira/browse/TEZ-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078712#comment-14078712
 ] 

Siddharth Seth commented on TEZ-717:
------------------------------------

Thanks [~jeagles] - reply and some more review comments follow.

bq. Is there a way to set the default file system programmatically? I've been 
setting fs.defaultFS to file:/// in tez-site.xml for now, but would love to do 
what you are suggesting.
Setting fs.defaultFS programmatically in the TezConfiguration instance being 
used should fix this ? - or more specifically ensuring the staging dir being 
created is always local. 

bq. Left this alone for now. If you don't think we need this in the first pass, 
I'm tempted to leave it for later. Unless you have a good solution idea.
Agree. Follow up jira.

bq. I've set the LOCAL_DIRS environment variable to address this. Something 
more elaborate is needed. What are you thoughts on this?
Is the expectation that the LOCAL_DIRS set by the client will also be used by 
the LocalContainerLauncher ? I think this is OK to get started, but it may make 
sense for the tasks / AM to have different directories. Containers cannot have 
a per containerId directory - since that would cause issues with local fetch, 
as well as remote fetch when that is supported. Don't think this needs to be 
fixed right now though. 
One separation that would be useful however, is between the directory written 
to by the client, the LOG_DIR and the LOCAL_DIR - sub directories within the 
work-dir should be sufficient.

bq. At first, something like this wasn't necessary. Now that TezClient and 
DAGClient both are creating Local FrameworkClients, they need to share the 
dapAppMaster, clientHandler, and others. I'm ok with another approach such as a 
global cache of objects. Thoughts?
Do you mean the TezClient and DAGClient would otherwise end up creating 
multiple different instances of the FrameworkClient ? One simple way I can 
think of is to cache local client instances in 
FrameworkClient.createFrameworkClient.

On the patch, which I think is mostly ready to go in.
- Minor. LocalClient(Configuration) - instead of this, can the configuration 
which is anyway passed via init(Configuration) be used.
- The DAGAppMaster thread is only started once. This can be problematic in 
non-session mode - since the AM will end up in a final state after executing a 
single DAG. Subsequent DAG submissions using the same instance of TezClient 
will not be possible. I think this can be addressed as a separate jira.
- The DAGAppMasater wait state loop - should this break out on SUCCEEDED as 
well ? (unlikely but a super fast DAG). Also, for subsequent enhancements, it 
may be possible to just have the DAGAM thread notify once initAndStartAppMaster 
completes.
- Minor: report.setProgress((Float) dagAppMaster.getProgress()); - cast is not 
required.
- There's visibility issues for dagAppMaster, clientHandler and other variables 
which are setup in the DAGAppMaster thread. The variables are being set in the 
DAG-AM thread, and accessed in the client thread without synchronization or 
being volatile. This, I think, needs to be fixed.

Some other things which need to be looked at in follow up jiras
- Cleanup of user-dirs after a session completes
- Don't think there's any use enabling Recovery for LocalMode ?


> Client changes to allow local mode DAG submission
> -------------------------------------------------
>
>                 Key: TEZ-717
>                 URL: https://issues.apache.org/jira/browse/TEZ-717
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Jonathan Eagles
>            Priority: Blocker
>         Attachments: TEZ-717-v10.patch, TEZ-717-v12.patch, TEZ-717-v13.patch, 
> TEZ-717-v6.patch, TEZ-717-v8.patch, TEZ-717-v9.patch, TEZ-717.patch, 
> TEZ-717.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to