guanziyue commented on code in PR #4913:
URL: https://github.com/apache/hudi/pull/4913#discussion_r1205729543
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##########
@@ -273,4 +281,32 @@ protected static Option<IndexedRecord>
toAvroRecord(HoodieRecord record, Schema
return Option.empty();
}
}
+
+ protected class AppendLogWriteCallback extends
DefaultHoodieLogFileWriteCallBack {
+ // here we distinguish log files created from log files being appended.
Considering following scenario:
+ // An appending task write to log file.
+ // (1) append to existing file file_instant_writetoken1.log.1
+ // (2) rollover and create file file_instant_writetoken2.log.2
+ // Then this task failed and retry by a new task.
+ // (3) append to existing file file_instant_writetoken1.log.1
+ // (4) rollover and create file file_instant_writetoken3.log.2
+ // finally file_instant_writetoken2.log.2 should not be committed to hudi,
we use marker file to delete it.
+ // keep in mind that log file is not always fail-safe unless it never roll
over
+
+ @Override
+ public boolean preLogFileOpen(HoodieLogFile logFileToAppend) {
+ // here we use createIfNotExists because in some rare cases, the task
was pulled up again with same write file name,
+ // for e.g, reuse the small log files from last commit instant in flink.
Task retry in spark.
+ WriteMarkers writeMarkers =
WriteMarkersFactory.get(config.getMarkersType(), hoodieTable, instantTime);
+ return writeMarkers.create(partitionPath, logFileToAppend.getFileName(),
IOType.APPEND,
Review Comment:
Will correct the comment. We use createIfNotExist in the first
implementation. But after that, there is new feature "early conflict detection"
which need to use `create`. So we use this to keep this feature works well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]