voonhous commented on code in PR #7997:
URL: https://github.com/apache/hudi/pull/7997#discussion_r1118217290
##########
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java:
##########
@@ -122,7 +129,24 @@ public HoodieFileGroupId getFileGroupId() {
* some log files, that are based off a commit or delta commit.
*/
private boolean isFileSliceCommitted(FileSlice slice) {
- if (!compareTimestamps(slice.getBaseInstantTime(), LESSER_THAN_OR_EQUALS,
lastInstant.get().getTimestamp())) {
+ if (compareTimestamps(slice.getBaseInstantTime(), GREATER_THAN,
lastInstant.get().getTimestamp())) {
+ return false;
+ }
+
+ if (!slice.getBaseFile().isPresent() &&
timeline.isBeforeTimelineStarts(slice.getBaseInstantTime())) {
Review Comment:
IIUC, this issue is only specific to Flink writers as the the write and
rollback logic might execute concurrently.
While this will fix the issue, i agree that the cleaner way is for MOR to
delete the log files so that the responsibility of writers and rollback
executors are clearly defined. i.e. Writers should not know or handle the
quirks of rollback executors. The current way of fixing things does introduce
code smell.
To be honest, I am getting a lot of resistance and pushback with this fix,
so I am going to take small steps. I have already spent 8 hours writing a test
case and there doesn't seem to be an agreement in which direction everybody
wants to take.
@danny0405 @bvaradar LMK which direction the both of you would like to take
to fix this issue.
Thank you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]