anthonysgro opened a new issue, #7162:
URL: https://github.com/apache/iceberg/issues/7162

   ### Feature Request / Improvement
   
   As it stands today, if you want to employ both Spark and Athena for your 
iceberg tables in v1.1.0, you must disable the vectorized reader. The reason is 
because Athena writes fields in a delta encoded manner, which is unsupported by 
the vectorized reader.
   
   If you have ever hit the following error, you have probably been impacted by 
this issue:
   `
   java.lang.UnsupportedOperationException: Cannot support vectorized reads for 
column [email] optional binary email (STRING) = 1 with encoding 
DELTA_BYTE_ARRAY. Disable vectorized reads to read this table/file
        at 
org.apache.iceberg.arrow.vectorized.parquet.VectorizedPageIterator.initDataReader(VectorizedPageIterator.java:96)
   `
   
   Spark has implemented this support in 2022: 
https://github.com/apache/spark/pull/35262
   However, Iceberg uses its own vectorized reader.
   
   Is it possible to implement support for these encodings? It would solve a 
significant interoperability problem between Athena, Spark, and possibly other 
query engines using them.
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to