abstractdog commented on a change in pull request #60:
URL: https://github.com/apache/tez/pull/60#discussion_r820285137
##########
File path: tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/DAGImpl.java
##########
@@ -1772,6 +1774,52 @@ private static void parseVertexEdges(DAGImpl dag,
Map<String, EdgePlan> edgePlan
vertex.setInputVertices(inVertices);
vertex.setOutputVertices(outVertices);
+ boolean cleanupShuffleDataAtVertexLevel =
dag.dagConf.getBoolean(TezConfiguration.TEZ_AM_VERTEX_CLEANUP_ON_COMPLETION,
+ TezConfiguration.TEZ_AM_VERTEX_CLEANUP_ON_COMPLETION_DEFAULT) &&
ShuffleUtils.isTezShuffleHandler(dag.dagConf);
+ if (cleanupShuffleDataAtVertexLevel) {
+ int deletionHeight =
dag.dagConf.getInt(TezConfiguration.TEZ_AM_VERTEX_CLEANUP_HEIGHT,
+ TezConfiguration.TEZ_AM_VERTEX_CLEANUP_HEIGHT_DEFAULT);
+ getSpannedVerticesAncestors(vertex, ancestors, deletionHeight);
Review comment:
this part needs a bit of refactoring I think
1. ancestors+children parse should go into VertexImpl, as it has nothing to
do with DAGImpl
2. ancestors, children should go into a new class, something like
DeletionContext(height), which wraps everything related to vertex relationships
regarding vertex deletion
currently, VertexImpl has fields children and incompleteChildrenVertices,
which implies it's a generally usable field containing vertex relationships,
but instead, it was initiated for a given height, and makes sense only in case
of vertex deletion
if we put both the fields and the login into a separate class, it becomes
more modular
also, I would like to see unit tests for getSpannedVerticesAncestors and
getSpannedVerticesChildren, as this is a brand new, recursive logic
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]