[GitHub] [hudi] boneanxs commented on a diff in pull request #8876: [HUDI-6312] Guard incremental query for multi-writer scenario

via GitHub Sun, 04 Jun 2023 23:42:04 -0700


boneanxs commented on code in PR #8876:
URL: https://github.com/apache/hudi/pull/8876#discussion_r1217591203



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala:
##########
@@ -60,11 +60,14 @@ case class MergeOnReadIncrementalRelation(override val 
sqlContext: SQLContext,
 
   override protected def timeline: HoodieTimeline = {
     if (fullTableScan) {
-      metaClient.getCommitsAndCompactionTimeline
+      filterTimelineForIncrementalQueryIfNeeded(metaClient,

Review Comment:
   Oh, sorry, this is not only related to `fullTableScan`, but also for normal 
incremental read. If users manually pass `startTimestamp` and `endTimestamp` to 
perform incremental read, they might not read all data they needed since some 
instants might be filtered by non-completed commits. This changes the behavior 
before.
   
   I think in most cases, we concern streaming jobs(like 
`HoodieStreamingSource`) miss non-complete commits, but for incremental jobs 
which the user specifies `endTimestamp`, do we need to also keep strict 
behavior here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] boneanxs commented on a diff in pull request #8876: [HUDI-6312] Guard incremental query for multi-writer scenario

Reply via email to