boneanxs commented on code in PR #8876:
URL: https://github.com/apache/hudi/pull/8876#discussion_r1217591203
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala:
##########
@@ -60,11 +60,14 @@ case class MergeOnReadIncrementalRelation(override val
sqlContext: SQLContext,
override protected def timeline: HoodieTimeline = {
if (fullTableScan) {
- metaClient.getCommitsAndCompactionTimeline
+ filterTimelineForIncrementalQueryIfNeeded(metaClient,
Review Comment:
Oh, sorry, this is not only related to `fullTableScan`, but also for normal
incremental read. If users manually pass `startTimestamp` and `endTimestamp` to
perform incremental read, they might not read all data they needed since some
instants might be filtered by non-completed commits. This changes the behavior
before.
I think in most cases, we concern streaming jobs(like
`HoodieStreamingSource`) miss non-complete commits, but for incremental jobs
which the user specifies `endTimestamp`, do we need to also keep strict
behavior here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]