yihua commented on code in PR #9473:
URL: https://github.com/apache/hudi/pull/9473#discussion_r1320357906
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java:
##########
@@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext
jssc, String srcBaseP
}
});
Review Comment:
```suggestion
// When `beginInstantTime` is `DEFAULT_BEGIN_TIMESTAMP` (due to missing
checkpoint), `previousInstantTime` is set to `DEFAULT_BEGIN_TIMESTAMP` as well.
```
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java:
##########
@@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext
jssc, String srcBaseP
}
});
- String previousInstantTime = beginInstantTime;
+ String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP;
if (!beginInstantTime.equals(DEFAULT_BEGIN_TIMESTAMP)) {
Review Comment:
```suggestion
// When `beginInstantTime` is present, `previousInstantTime` is set to the
completed commit before `beginInstantTime` if that exists. If there is no
completed commit before `beginInstantTime`, e.g., `beginInstantTime` is the
first commit in the active timeline, `previousInstantTime` is set to
`DEFAULT_BEGIN_TIMESTAMP`.
if (!beginInstantTime.equals(DEFAULT_BEGIN_TIMESTAMP)) {
```
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java:
##########
@@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext
jssc, String srcBaseP
}
});
- String previousInstantTime = beginInstantTime;
+ String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP;
Review Comment:
Have you also gone through the logic for
`MissingCheckpointStrategy.READ_LATEST` and see if there's any gap?
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java:
##########
@@ -130,7 +130,7 @@ public static QueryInfo generateQueryInfo(JavaSparkContext
jssc, String srcBaseP
}
});
- String previousInstantTime = beginInstantTime;
+ String previousInstantTime = DEFAULT_BEGIN_TIMESTAMP;
Review Comment:
Basically, the only change here is when `beginInstantTime === <first
commit>`, the `previousInstantTime` is `DEFAULT_BEGIN_TIMESTAMP` now whereas
`<first commit>` before this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]