[
https://issues.apache.org/jira/browse/TEZ-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225470#comment-16225470
]
Kuhu Shukla commented on TEZ-3363:
----------------------------------
bq. On the vertex events - does the vertex make sure that every downstream
vertex at the specified depth is complete?
Yes, although looking at my current design it might be fragile to cases of
duplicate vertex complete events from the same vertex. Besides that, the
children data structure in VertexImpl takes care that all of them finish before
vertexComplete() on the ancestor is called. You are right that this might be
easier to do at Dag level.
bq. When the data for a vertex is deleted, I think it'll be better to move it
into a different state, so that in case of failures / re-runs which require
data from this vertex, the vertex tasks can be re-run directly, instead of
relying on failures from the source to trigger re-runs of upstream tasks (how
slow/fast is this?). This can be problematic if the entire vertex ends up
re-running even if all data is not required by a downstream task. Ideally,
would be nice to re-run tasks when a downstream consumer requests this data.
Agreed, since we know the re-runs must happen once data has been deleted. This
will help bypass fetch reties and failure detection time in the current design.
Will update patch and get back asap.
> Delete intermediate data at the vertex level for Shuffle Handler
> ----------------------------------------------------------------
>
> Key: TEZ-3363
> URL: https://issues.apache.org/jira/browse/TEZ-3363
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jonathan Eagles
> Assignee: Kuhu Shukla
> Attachments: TEZ-3363.001.patch, TEZ-3363.002.patch
>
>
> For applications like pig where processing times can be very long,
> applications may choose to delete intermediate data for a sub dag. For
> example if a DAG has synced data to HDFS, all upstream intermediate data can
> be safely deleted.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)