danny0405 commented on code in PR #8876:
URL: https://github.com/apache/hudi/pull/8876#discussion_r1217616438
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala:
##########
@@ -60,11 +60,14 @@ case class MergeOnReadIncrementalRelation(override val
sqlContext: SQLContext,
override protected def timeline: HoodieTimeline = {
if (fullTableScan) {
- metaClient.getCommitsAndCompactionTimeline
+ filterTimelineForIncrementalQueryIfNeeded(metaClient,
Review Comment:
I have similiar concern, for batch incremental queries with explicit start
and end timestamp, there is no need to skip any completed commiits. The invoker
should handle the 'hollow' instants (inflight in current time range while
completes later) by themselves.
And for streaming source, we could checkpoint both the max completion
time(transition time) and the max version id(instant time), these 2 timestamp
could help to find out the 'hollow' instants and fix the reader automatically.
The streaming reader fix makes more sense generally to users.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]