Re: [PR] Spark, Parquet: honor Hadoop vectored IO read config [iceberg]

via GitHub Sat, 30 May 2026 08:03:41 -0700


alexandrefimov commented on code in PR #16614:
URL: https://github.com/apache/iceberg/pull/16614#discussion_r3328907921



##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java:
##########
@@ -125,6 +148,10 @@ protected NameMapping nameMapping() {
     return nameMapping;
   }

Review Comment:
   Thanks for working on this. I was looking at whether the setting reaches all 
Parquet read paths and noticed one path that may still be uncovered.
   
   `BaseRowReader` and `BaseBatchReader` now call 
`setAll(parquetReadProperties())` for Parquet data files, but 
`SparkDeleteFilter` still creates `BaseDeleteLoader`/`CachingDeleteLoader` with 
only `loadInputFile`. `BaseDeleteLoader.openDeletes` then builds the reader 
without any read properties:
   
   `FormatModelRegistry.readBuilder(format, Record.class, inputFile)`
   
   For S3FileIO-backed equality/position delete files, the `InputFile` is not a 
`HadoopInputFile`, so `Parquet.buildReadOptions` may not see the Spark Hadoop 
configuration and may fall back to the default vectored IO setting. That would 
mean `spark.hadoop.parquet.hadoop.vectored.io.enabled=false` may still be 
ignored while applying Parquet delete files.
   
   Could you check whether delete-file reads need the same propagated 
`parquetReadProperties`? If so, a small regression test with a Parquet 
position/equality delete file would make this easier to keep covered.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark, Parquet: honor Hadoop vectored IO read config [iceberg]

Reply via email to