[
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kuhu Shukla updated TEZ-3363:
-----------------------------
Attachment: TEZ-3363.001.patch
First cut of the patch.
This patch adds a new event that is sent by every vertex to its ancestors at a
given/configurable height. The ancestors are precomputed during DagPlan
creation. When a vertex successfully completes, it sends this event to all the
ancestors. A vertex that receives this event, calls vertexComplete() from the
DagAppMaster all the way down to the ContainerLauncher and boiling down to the
DeletionTracker. The vertex delete http calls to shuffle handler are sent
serially right now to the nodes/shufflehandlers and that can be changed based
on some feedback by sharing the threadpool or creating a new one. The shuffle
handler does a regex match on the directory names and deletes the ones
belonging to the passed vertex.
In case a vertex fails vertex deletion does not quite kick in since the DAG
will be failed and cleanup will follow shorty anyway.
ShuffleUtils.isTezShuffleHandler calls have been added to VertexImpl which
therefore has to import this new class, which may be undesired( ? ).
Will let the precommit run and ask for some feedback soon.
> Delete intermediate data at the vertex level for Shuffle Handler
> ----------------------------------------------------------------
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jonathan Eagles
> Assignee: Kuhu Shukla
> Attachments: TEZ-3363.001.patch
>
>
> For applications like pig where processing times can be very long,
> applications may choose to delete intermediate data for a sub dag. For
> example if a DAG has synced data to HDFS, all upstream intermediate data can
> be safely deleted.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)