[
https://issues.apache.org/jira/browse/TEZ-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652777#comment-15652777
]
Siddharth Seth commented on TEZ-3509:
-------------------------------------
We may be better of relying on the ShuffleHandler accepting the parameter on
dag vs vertex delete, instead of a path. I can't see a case where we want to go
more granular than that. Path level starts exposing a lot of information, which
can otherwise be avoided, in the container launcher.
e.g. a call to the shuffle handler saying
dagDelete?action=delete&job=jobId&dag=dagId
vertexDelete?action=delete&job=jobId&dag=dagId&vertexId=vertexId
The ShuffleHandler has all the relevant information, and context available to
delete the necessary files.
If going with the path route, have to make sure that no absolute paths are sent
in. The paths in fact should be a localized directory for the app (i.e. I
should not be able to delete local resources used by the app using this
invocation).
In terms of the patch - mostly looks good. There's some access to private
members of the Tracker class. That should be changed to methods (node
registration).
I think there was mentioned of pluggability of this piece at some point. The
ContainerLauncher can be shared by entities which use different
ShuffleHandlers. e.g. If the MR ShuffleHandler were to be enhanced to support
deletes, deletion would not work since it is tied to the new Tez
ShuffleHandler. Making the entire DeletionTracker pluggable - with config
coming in via the ContainerLauncher payload would help with this. tez-dag
ideally should not be referencing the tez-runtime-library module. That's in
place today because of some stuff in the ShuffleHandler, and can be broken by
moving the VertexManager implementations out of tez-dag. Referencing
ShuffleHandler directly brings in one more dependency from tez-dag to
tez-runtime-library.
Unrelated to this specific patch.
The model used for dagComplete notifications is to only pass in the dagId. Dag
info is supposed to be picked up from the getDagInfo method in
{TaskCommunicator|ContainerLauncher}Context. At the moment, the entire DAG is
exposed.
> Make DAG Deletion path based
> ----------------------------
>
> Key: TEZ-3509
> URL: https://issues.apache.org/jira/browse/TEZ-3509
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
> Attachments: TEZ-3509.001.patch
>
>
> The idea here is to have the API take a path to delete, be it DAG path today
> or a vertex level path later on. The current implementation takes a flag
> which is specific to DAG deletion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)