TheR1sing3un commented on code in PR #14205:
URL: https://github.com/apache/hudi/pull/14205#discussion_r2501915239
##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala:
##########
@@ -78,16 +80,16 @@ class
SparkFileFormatInternalRowReaderContext(baseFileReader: SparkColumnarFileR
assert(getRecordContext.supportsParquetRowIndex())
}
val structType = HoodieInternalRowUtils.getCachedSchema(requiredSchema)
+ val (readSchema, readFilters) = getSchemaAndFiltersForRead(structType,
hasRowIndexField)
Review Comment:
> I see the filters also include `requiredFilters`, can you investigate a
little more what it is for MOR reading.
Thanks, Danny, you are right, I delved further into the logic related to
requiredFilter in mor reading. Only when we perform incremental mor reading
will this `requiredFilters` have an actual value. It uses the metadata column
of `_hoodie_commit_time` for data filtering, which does not meet the conditions
of a primary key, **but it** doesn't matter, pushing down this filter does not
affect the correctness of our data. Due to the particularity of the commit time
itself, we can push this filter down at any time
<img width="868" height="219" alt="image"
src="https://github.com/user-attachments/assets/4d37bbee-954a-422a-b8ce-affe7b442f67"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]