Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

via GitHub Tue, 02 Apr 2024 00:48:37 -0700


abstractdog commented on code in PR #341:
URL: https://github.com/apache/tez/pull/341#discussion_r1547322934



##########
tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/MRInputHelpers.java:
##########
@@ -889,4 +895,29 @@ public static int getDagAttemptNumber(Configuration conf) {
     return getIntProperty(conf, MRInput.TEZ_MAPREDUCE_DAG_ATTEMPT_NUMBER);
   }
 
+  public static MRSplitProto getProto(InputDataInformationEvent initEvent, 
JobConf jobConf) throws IOException {
+    return Strings.isNullOrEmpty(initEvent.getSerializedPath()) ? 
readProtoFromPayload(initEvent)
+      : readProtoFromFs(initEvent, jobConf);
+  }
+
+  private static MRSplitProto readProtoFromFs(InputDataInformationEvent 
initEvent, JobConf jobConf) throws IOException {
+    String serializedPath = initEvent.getSerializedPath();
+    Path filePath = new Path(serializedPath);
+    LOG.info("Reading InputDataInformationEvent from path: {}", filePath);
+
+    MRSplitProto splitProto = null;
+    FileSystem fs = FileSystem.get(filePath.toUri(), jobConf);
+
+    try (FSDataInputStream in = fs.open(filePath)) {
+      splitProto = MRSplitProto.parseFrom(in);
+      fs.delete(filePath, false);

Review Comment:
   yeah, I think under normal circumstances everything should work: parse + 
delete
   any of them failing is fine to lead to a task attempt failure (and 
eventually rerun)
   regarding leftover files: it's okay, current hive patch uses the tez scratch 
dir, which is supposed to be cleaned up by hive as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] TEZ-4548: InputDataInformationEvent to be read from serialized payload from filesystem [tez]

Reply via email to