TengHuo commented on code in PR #6733:
URL: https://github.com/apache/hudi/pull/6733#discussion_r1010026246


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionPlanOperator.java:
##########
@@ -129,9 +128,6 @@ private void scheduleCompaction(HoodieFlinkTable<?> table, 
long checkpointId) th
       List<CompactionOperation> operations = 
compactionPlan.getOperations().stream()
           
.map(CompactionOperation::convertFromAvroRecordInstance).collect(toList());
       LOG.info("Execute compaction plan for instant {} as {} file groups", 
compactionInstantTime, operations.size());
-      WriteMarkersFactory
-          .get(table.getConfig().getMarkersType(), table, 
compactionInstantTime)
-          .deleteMarkerDir(table.getContext(), 
table.getConfig().getMarkersDeleteParallelism());

Review Comment:
   Yeah, you are right. I agree with that there shouldn't be duplicate marker 
file in the first place.
   
   The duplicate marker file means duplicate data file. So for fixing this 
duplicate marker file issue(https://issues.apache.org/jira/browse/HUDI-4108), 
we have to find out how this duplicate marker file generated. Or, we should 
delete the marker file and the data file together instead of deleting marker 
file directory only.
   
   My PR can't fix the duplicate marker file issue as I haven't encountered the 
same problem as [HUDI-4108](https://issues.apache.org/jira/browse/HUDI-4108).
   
   About rollback function in `CompactionPlanOperator#open` and 
`CompactionCommitSink#commitIfNecessary`, they should delete the left data 
files if there is anything wrong during the compaction, but from our log files, 
they didn't work properly.
   
   Let me check why rollback not working properly in our pipeline. Will reply 
here later.
   
   My idea is that we shouldn't delete marker file directory here. It should be 
deleted in rollback function when deleting data files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to