yihua commented on code in PR #11947:
URL: https://github.com/apache/hudi/pull/11947#discussion_r1802136616
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/IncrementalRelation.scala:
##########
@@ -305,14 +286,17 @@ class IncrementalRelation(val sqlContext: SQLContext,
}
private def fullTableScanDataFrame(startInstantTime: String, endInstantTime:
String): DataFrame = {
+ val commitTimesToReturn = commitsToReturn.map(_.getTimestamp)
val hudiDF = sqlContext.read
.format("hudi_v1")
.schema(usedSchema)
.load(basePath.toString)
- .filter(String.format("%s > '%s'",
HoodieRecord.COMMIT_TIME_METADATA_FIELD, //Notice the > in place of >= because
we are working with optParam instead of first commit > optParam
+ // using closed_closed range here since we are using start instant from
QueryContext
+ .filter(String.format("%s >= '%s'",
HoodieRecord.COMMIT_TIME_METADATA_FIELD,
startInstantTime))
.filter(String.format("%s <= '%s'",
HoodieRecord.COMMIT_TIME_METADATA_FIELD,
Review Comment:
I think we can remove it as the `IN` predicate should also enable predicate
pushdown and efficient filtering with column ranges in parquet.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]