kbuci commented on PR #11580: URL: https://github.com/apache/hudi/pull/11580#issuecomment-2380097399
> I was chasing some test failures in this patch and realized that flink might have an issue. In [this](https://github.com/apache/hudi/blob/ed65de1460468ad33a374a66606c0baae6cc129b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/CompactionUtil.java#L78) piece of code block, we generate a compaction time in the past. So, the additional validation in this patch may not sit well w/ flink. > @nsivabalan My understanding of MOR compaction on latest 0.x is likely out of date so sorry if my comment here might not make sense , but I assumed that (in 0.x) once a compaction plan with instant time T targeting a file group is created, any write (deltacommit) that has a greater instant time than T will create a new log file with an instant time of T (assuming appends are disabled). If this is the case, then if you have a MOR dataset with [C0.deltacommit, C2.deltacommit.inflight] and then a compaction plan is scheduled with earlier timestamp [C0.deltacommit, C1.compaction.requested, C2.deltacommit.requested] , then there might be no issue on the base table as long as C2 fails itself during write conflict resolution. But if this MOR dataset has a metadata table, then we might find ourselves in same case we discussed offline (first scenario in https://issues.apache.org/jira/browse/HUDI-7507). Specifically, if the writer that worked on C2 (or a greater instant) scheduled a com paction on the metadata table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
