KnightChess commented on code in PR #9035:
URL: https://github.com/apache/hudi/pull/9035#discussion_r1315492039
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java:
##########
@@ -140,10 +157,99 @@ protected Path makeNewFilePath(String partitionPath,
String fileName) {
* Creates an empty marker file corresponding to storage writer path.
*
* @param partitionPath Partition path
+ * @param dataFileName data file for which inprogress marker creation is
requested
+ * @param markerInstantTime - instantTime associated with the request
+ * returns true - inprogress marker successfully created,
+ * false - inprogress marker was not created.
+ */
+ protected void createInProgressMarkerFile(String partitionPath, String
dataFileName, String markerInstantTime) {
+ WriteMarkers writeMarkers =
WriteMarkersFactory.get(config.getMarkersType(), hoodieTable, instantTime);
+ if (!writeMarkers.doesMarkerDirExist()) {
+ throw new HoodieIOException(String.format("Marker root directory absent
: %s/%s (%s)",
+ partitionPath, dataFileName, markerInstantTime));
+ }
+ writeMarkers.create(partitionPath, dataFileName, getIOType());
+ }
+
+ protected boolean recoverWriteStatusIfAvailable(String partitionPath, String
dataFileName,
+ String markerInstantTime) {
+ WriteMarkers writeMarkers =
WriteMarkersFactory.get(config.getMarkersType(), hoodieTable, instantTime);
+ if (config.isFailRetriesAfterFinalizeWrite()
+ &&
writeMarkers.markerExists(writeMarkers.getCompletionMarkerPath(StringUtils.EMPTY_STRING,
+ FINALIZE_WRITE_COMPLETED, markerInstantTime, IOType.CREATE))) {
+ throw new HoodieCorruptedDataException(" Failing retry attempt for
instant " + instantTime
+ + " as the job is trying to re-write the data files, after writes
have been finalized.");
+ }
+ if (config.optimizeTaskRetriesWithMarkers()
+ &&
writeMarkers.markerExists(writeMarkers.getCompletionMarkerPath(partitionPath,
fileId, markerInstantTime, getIOType()))) {
Review Comment:
have a question, if one partition split will product two parquet file in
stage 1, file name is:
- p1/f1-0_0-1-0_001.parquet
- p1/f1-1_0-1-0_001.parquet
exec:
1. if p1/f1-0_0-1-0_001.parquet is write success, and
p1/f1-1_0-1-0_001.parquet is failed
2. will have `p1/f1-0_001.success.CREATE` file
3. when task or stage retry, here will check `p1/f1-0_001.success.CREATE`,
it will return true, handle will be null, it will cause NullPointExecption? And
`p1/f1-1_0-x-x_001.parquet` will can not product?
If I am wrong, please help correct
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]