alexandrefimov commented on code in PR #16614:
URL: https://github.com/apache/iceberg/pull/16614#discussion_r3328907921
##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java:
##########
@@ -125,6 +148,10 @@ protected NameMapping nameMapping() {
return nameMapping;
}
Review Comment:
Thanks for working on this. I was looking at whether the setting reaches all
Parquet read paths and noticed one path that may still be uncovered.
`BaseRowReader` and `BaseBatchReader` now call
`setAll(parquetReadProperties())` for Parquet data files, but
`SparkDeleteFilter` still creates `BaseDeleteLoader`/`CachingDeleteLoader` with
only `loadInputFile`. `BaseDeleteLoader.openDeletes` then builds the reader
without any read properties:
`FormatModelRegistry.readBuilder(format, Record.class, inputFile)`
For S3FileIO-backed equality/position delete files, the `InputFile` is not a
`HadoopInputFile`, so `Parquet.buildReadOptions` may not see the Spark Hadoop
configuration and may fall back to the default vectored IO setting. That would
mean `spark.hadoop.parquet.hadoop.vectored.io.enabled=false` may still be
ignored while applying Parquet delete files.
Could you check whether delete-file reads need the same propagated
`parquetReadProperties`? If so, a small regression test with a Parquet
position/equality delete file would make this easier to keep covered.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]