JFinis opened a new issue, #8430:
URL: https://github.com/apache/iceberg/issues/8430

   ### Apache Iceberg version
   
   1.3.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Icebergs containing Parquet v2 files (delta encoding) and not having the 
`read.parquet.vectorization.enabled` property set to `false` cannot be read by 
Spark (and others using vectorized reads?). The code will error out with a 
message roughly telling that vectorized reads aren't supported for delta 
encoding.
   
   Why is this a problem?
   -----------------------
   
   Because the property `read.parquet.vectorization.enabled` is enabled by 
default. Thus any iceberg missing this property will not work once it contains 
delta encodings. This property is an implementation detail of the reference 
implementation of Iceberg. The property is also not defined in the Iceberg spec.
   
   Thus, a conforming implementation (different from the reference 
implementation) could (**and does**, that's why I'm writing this issue!) write 
an Iceberg using Parquet v2 files using delta encoding and not write the 
property. According to the spec, this is a perfectly valid Iceberg. However, 
the reference implementation will not be able to read it. The reference 
implementation should be able to read any and every valid Iceberg, so this 
property should not be necesary.
   
   Solution
   --------
   
   The property `read.parquet.vectorization.enabled` should not be required to 
read an Iceberg. Instead, the reference implementation should fall back to a 
non-vectorized reader if an encoding is used for which it doesn't support 
vectorized reads.
   Alternatively, the reference implementation should treat the property as 
`false` when it is not set.
   
   Either of these fixes will result in spec-conforming Icebergs without the 
property being able to be read by the reference implementation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to