JFinis opened a new issue, #8430: URL: https://github.com/apache/iceberg/issues/8430
### Apache Iceberg version 1.3.1 (latest release) ### Query engine Spark ### Please describe the bug 🐞 Icebergs containing Parquet v2 files (delta encoding) and not having the `read.parquet.vectorization.enabled` property set to `false` cannot be read by Spark (and others using vectorized reads?). The code will error out with a message roughly telling that vectorized reads aren't supported for delta encoding. Why is this a problem? ----------------------- Because the property `read.parquet.vectorization.enabled` is enabled by default. Thus any iceberg missing this property will not work once it contains delta encodings. This property is an implementation detail of the reference implementation of Iceberg. The property is also not defined in the Iceberg spec. Thus, a conforming implementation (different from the reference implementation) could (**and does**, that's why I'm writing this issue!) write an Iceberg using Parquet v2 files using delta encoding and not write the property. According to the spec, this is a perfectly valid Iceberg. However, the reference implementation will not be able to read it. The reference implementation should be able to read any and every valid Iceberg, so this property should not be necesary. Solution -------- The property `read.parquet.vectorization.enabled` should not be required to read an Iceberg. Instead, the reference implementation should fall back to a non-vectorized reader if an encoding is used for which it doesn't support vectorized reads. Alternatively, the reference implementation should treat the property as `false` when it is not set. Either of these fixes will result in spec-conforming Icebergs without the property being able to be read by the reference implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
