[ 
https://issues.apache.org/jira/browse/TEZ-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652777#comment-15652777
 ] 

Siddharth Seth commented on TEZ-3509:
-------------------------------------

We may be better of relying on the ShuffleHandler  accepting the parameter on 
dag vs vertex delete, instead of a path. I can't see a case where we want to go 
more granular than that. Path level starts exposing a lot of information, which 
can otherwise be avoided, in the container launcher.

e.g. a call to the shuffle handler saying
dagDelete?action=delete&job=jobId&dag=dagId
vertexDelete?action=delete&job=jobId&dag=dagId&vertexId=vertexId
The ShuffleHandler has all the relevant information, and context available to 
delete the necessary files.

If going with the path route, have to make sure that no absolute paths are sent 
in. The paths in fact should be a localized directory for the app (i.e. I 
should not be able to delete local resources used by the app using this 
invocation).

In terms of the patch - mostly looks good. There's some access to private 
members of the Tracker class. That should be changed to methods (node 
registration).

I think there was mentioned of pluggability of this piece at some point. The 
ContainerLauncher can be shared by entities which use different 
ShuffleHandlers. e.g. If the MR ShuffleHandler were to be enhanced to support 
deletes, deletion would not work since it is tied to the new Tez 
ShuffleHandler. Making the entire DeletionTracker pluggable - with config 
coming in via the ContainerLauncher payload would help with this. tez-dag 
ideally should not be referencing the tez-runtime-library module. That's in 
place today because of some stuff in the ShuffleHandler, and can be broken by 
moving the VertexManager implementations out of tez-dag. Referencing 
ShuffleHandler directly brings in one more dependency from tez-dag to 
tez-runtime-library.

Unrelated to this specific patch.
The model used for dagComplete notifications is to only pass in the dagId. Dag 
info is supposed to be picked up from the getDagInfo method in 
{TaskCommunicator|ContainerLauncher}Context. At the moment, the entire DAG is 
exposed.

> Make DAG Deletion path based
> ----------------------------
>
>                 Key: TEZ-3509
>                 URL: https://issues.apache.org/jira/browse/TEZ-3509
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: TEZ-3509.001.patch
>
>
> The idea here is to have the API take a path to delete, be it DAG path today 
> or a vertex level path later on. The current implementation takes a flag 
> which is specific to DAG deletion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to