xiarixiaoyao commented on a change in pull request #5168:
URL: https://github.com/apache/hudi/pull/5168#discussion_r837295103
##########
File path:
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##########
@@ -48,6 +48,13 @@ class MergeOnReadIncrementalRelation(sqlContext: SQLContext,
override type FileSplit = HoodieMergeOnReadFileSplit
+ override def imbueConfigs(sqlContext: SQLContext): Unit = {
+
sqlContext.sparkSession.sessionState.conf.setConfString("spark.sql.parquet.filterPushdown",
"true")
+
sqlContext.sparkSession.sessionState.conf.setConfString("spark.sql.parquet.recordLevelFilter.enabled",
"true")
Review comment:
mor incremental query need to filter data through file level filters
from spark side
```
val PARQUET_RECORD_FILTER_ENABLED =
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
.doc("If true, enables Parquet's native record-level filtering using the
pushed down " +
"filters. " +
s"This configuration only has an effect when
'${PARQUET_FILTER_PUSHDOWN_ENABLED.key}' " +
"is enabled and the vectorized reader is not used. You can ensure the
vectorized reader " +
s"is not used by setting '${PARQUET_VECTORIZED_READER_ENABLED.key}' to
false.")
.version("2.3.0")
.booleanConf
.createWithDefault(false)
```
We must ensure that these two configuration items take effect, otherwise the
read data will be repeated
i also post another pr https://github.com/apache/hudi/pull/5165 to enable
vectorize read for all cow/mor read
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]