Re: [PR] [HUDI-8141] Incremental Query with Completion Time [hudi]

via GitHub Tue, 15 Oct 2024 17:00:49 -0700


yihua commented on code in PR #11947:
URL: https://github.com/apache/hudi/pull/11947#discussion_r1802136616



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelation.scala:
##########
@@ -305,14 +286,17 @@ class IncrementalRelation(val sqlContext: SQLContext,
   }
 
   private def fullTableScanDataFrame(startInstantTime: String, endInstantTime: 
String): DataFrame = {
+    val commitTimesToReturn = commitsToReturn.map(_.getTimestamp)
     val hudiDF = sqlContext.read
       .format("hudi_v1")
       .schema(usedSchema)
       .load(basePath.toString)
-      .filter(String.format("%s > '%s'", 
HoodieRecord.COMMIT_TIME_METADATA_FIELD, //Notice the > in place of >= because 
we are working with optParam instead of first commit > optParam
+      // using closed_closed range here since we are using start instant from 
QueryContext
+      .filter(String.format("%s >= '%s'", 
HoodieRecord.COMMIT_TIME_METADATA_FIELD,
         startInstantTime))
       .filter(String.format("%s <= '%s'", 
HoodieRecord.COMMIT_TIME_METADATA_FIELD,

Review Comment:
   I think we can remove it as the `IN` predicate should also enable predicate 
pushdown and efficient filtering with column ranges in parquet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8141] Incremental Query with Completion Time [hudi]

Reply via email to