kevinjqliu opened a new pull request, #16307: URL: https://github.com/apache/iceberg/pull/16307
Backport of #15683 (and length fix #16284) to `spark/v3.4`. Introduces `SerializableFileIOWithSize` to broadcast a table's `FileIO` to executors alongside the table metadata. Provides a `KnownSizeEstimation` so Spark skips the expensive `SizeEstimator` walk during broadcast, and makes `close()` a no-op on executors so broadcast cleanup does not destroy the driver's FileIO. ### Adaptation note v3.4's `BaseReader` still used the legacy `table.encryption().decrypt(...)` path. I switched that one method to `fileIO.bulkDecrypt(...)` to match v3.5/4.0/4.1, since the broadcast `FileIO` is now an `EncryptingFileIO` (combined in the constructor). All other files match the v3.5 patch byte-for-byte (with paths translated). ### Validation - `./gradlew -DsparkVersions=3.4 :iceberg-spark:iceberg-spark-3.4_2.12:test --tests "*SerializableFileIOWithSize*"` (new test, passes) - `./gradlew -DsparkVersions=3.4 :iceberg-spark:iceberg-spark-3.4_2.12:test --tests "org.apache.iceberg.spark.source.TestSparkReaderDeletes"` (passes) - Compile-checked spark-extensions tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
