[GitHub] [hudi] danny0405 commented on a diff in pull request #8876: [HUDI-6312] Guard incremental query for multi-writer scenario

via GitHub Mon, 05 Jun 2023 00:03:34 -0700


danny0405 commented on code in PR #8876:
URL: https://github.com/apache/hudi/pull/8876#discussion_r1217616438



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala:
##########
@@ -60,11 +60,14 @@ case class MergeOnReadIncrementalRelation(override val 
sqlContext: SQLContext,
 
   override protected def timeline: HoodieTimeline = {
     if (fullTableScan) {
-      metaClient.getCommitsAndCompactionTimeline
+      filterTimelineForIncrementalQueryIfNeeded(metaClient,

Review Comment:
   I have similiar concern, for batch incremental queries with explicit start 
and end timestamp, there is no need to skip any completed commiits. The invoker 
should handle the 'hollow' instants (inflight in current time range while 
completes later) by themselves.
   
   And for streaming source, we could checkpoint both the max completion 
time(transition time) and the max version id(instant time), these 2 timestamp 
could help to find out the 'hollow' instants and fix the reader automatically. 
The streaming reader fix makes more sense generally to users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] danny0405 commented on a diff in pull request #8876: [HUDI-6312] Guard incremental query for multi-writer scenario

Reply via email to