[ 
https://issues.apache.org/jira/browse/TEZ-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133032#comment-14133032
 ] 

Bikas Saha commented on TEZ-1563:
---------------------------------

There are many types of objects in the DAG - vertices, edges etc which have 
more objects in them including byte buffers etc. The DAG might have a huge 
number of vertices and edges. Doing a deep copy may be an unnecessary (and 
potentially slow operation) which will be useless in 99% of the cases where the 
failure case does not happen.

An alternative proposal would be to cache the DAGplan object in the DAG after 
its been created once. Upon error based re-submission, TezClient can check for 
the cached DAGPlan and use that directly instead of re-compiling a previously 
compiled DAG. [~jeagles] What do you think?

> TezClient.submitDAGSession alters DAG local resources regardless of DAG 
> submission
> ----------------------------------------------------------------------------------
>
>                 Key: TEZ-1563
>                 URL: https://issues.apache.org/jira/browse/TEZ-1563
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Josh Elser
>            Assignee: Bikas Saha
>
> In {{TezClient#submitDAGSesssion(Dag)}}, a {{DAGPlan}} is created from the 
> {{DAG}} before the {{DAGClientAMProtocolBlockingPB}} is instantiated. When 
> the application isn't running, {{waitForProxy()}} will throw a 
> {{SessionNotRunning}} Exception.
> The problem is that the internal state of the {{DAG}} is modified, regardless 
> of whether the DAG is actually run or not. 
> {code}
> DAGPlan dagPlan = dag.createDag(amConfig.getTezConfiguration());
> {code}
> The {{createDag}} method will ultimately call {{addTaskLocalFiles}} for each 
> {{Vertex}} in the {{DAG}}
> {code}
> // add common task files for this DAG
> vertex.addTaskLocalFiles(commonTaskLocalFiles);
> {code}
> Because the {{DAG}}'s state is modified, {{Vertex#addTaskLocalFiles(Map)}} 
> will fail if any resources are added multiple times. As such, if the 
> application is not running and {{SessionNotRunning}} is thrown, that same DAG 
> cannot be passed in to run the DAG after the application is started again.
> Additionally, {{DAG}} is missing a getTaskLocalFiles method as compared to 
> {{Vertex}} which would be good to add to make the two classes more uniform.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to