abstractdog commented on a change in pull request #60:
URL: https://github.com/apache/tez/pull/60#discussion_r820763109
##########
File path:
tez-plugins/tez-aux-services/src/main/java/org/apache/tez/auxservices/ShuffleHandler.java
##########
@@ -1256,6 +1280,29 @@ private String getBaseLocation(String jobId, String
dagId, String user) {
return baseStr;
}
+ /**
+ * Delete shuffle data in task directories belonging to a vertex.
+ */
+ private void deleteTaskDirsOfVertex(String jobId, String dagId, String
vertexId, String user) throws IOException {
+ String baseStr = getBaseLocation(jobId, dagId, user);
+ FileContext lfc = FileContext.getLocalFSFileContext();
+ for(Path dagPath : lDirAlloc.getAllLocalPathsToRead(baseStr, conf)) {
+ RemoteIterator<FileStatus> status = lfc.listStatus(dagPath);
+ final JobID jobID = JobID.forName(jobId);
+ String taskDirPrefix = "attempt" + jobID.toString().replace("job", "")
+
+ "_" + dagId + "_" + vertexId + "_";
+ while (status.hasNext()) {
+ FileStatus fileStatus = status.next();
+ Path attemptPath = fileStatus.getPath();
+ if (attemptPath.getName().startsWith(taskDirPrefix)) {
+ if(lfc.delete(attemptPath, true)) {
+ LOG.info("Deleted shuffle data in task directory : " +
attemptPath);
Review comment:
I would lower this to LOG.debug
under normal circumstances, we're not interested in all deleted paths on
attempt level like:
```
.../yarn/nm/usercache/hive/appcache/application_1646657991888_0003/dag_2/output/attempt_1646657991888_0003_2_01_000003_0_10017
```
also, if it's delete, use logging format to prevent useless string concat:
```
LOG.debug("Deleted shuffle data in task directory: {}", attemptPath);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]