TengHuo commented on code in PR #6733:
URL: https://github.com/apache/hudi/pull/6733#discussion_r1010026246
##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactionPlanOperator.java:
##########
@@ -129,9 +128,6 @@ private void scheduleCompaction(HoodieFlinkTable<?> table,
long checkpointId) th
List<CompactionOperation> operations =
compactionPlan.getOperations().stream()
.map(CompactionOperation::convertFromAvroRecordInstance).collect(toList());
LOG.info("Execute compaction plan for instant {} as {} file groups",
compactionInstantTime, operations.size());
- WriteMarkersFactory
- .get(table.getConfig().getMarkersType(), table,
compactionInstantTime)
- .deleteMarkerDir(table.getContext(),
table.getConfig().getMarkersDeleteParallelism());
Review Comment:
Yeah, you are right. I agree with that there shouldn't be duplicate marker
file in the first place.
The duplicate marker file means duplicate data file. So for fixing this
duplicate marker file issue(https://issues.apache.org/jira/browse/HUDI-4108),
we have to find out how this duplicate marker file generated. Or, we should
delete the marker file and the data file together instead of deleting marker
file directory only.
My PR can't fix the duplicate marker file issue as I haven't encountered the
same problem as [HUDI-4108](https://issues.apache.org/jira/browse/HUDI-4108).
About rollback function in `CompactionPlanOperator#open` and
`CompactionCommitSink#commitIfNecessary`, they should delete the left data
files if there is anything wrong during the compaction, but from our log files,
they didn't work properly.
Let me check why rollback not working properly in our pipeline. Will reply
here later.
My idea is that we shouldn't delete marker file directory here. It should be
deleted in rollback function when deleting data files.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]