[GitHub] [iceberg] anthonysgro commented on issue #5593: Reading records inserted using Athena throws UOE exception when read using Spark (AWS)

via GitHub Tue, 21 Mar 2023 11:22:40 -0700


anthonysgro commented on issue #5593:
URL: https://github.com/apache/iceberg/issues/5593#issuecomment-1478386917


   @a-agmon `read.parquet.vectorization.enabled` is not a supported field 
through athena: 
https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html#querying-iceberg-table-properties
   
   you have to set it either on the table level through something like a 
jupyter notebook, OR you can include this spark conf on your job:
   `.config("spark.sql.iceberg.vectorization.enabled", "false")`
   
   However, I don't like this workaround because you lose all the benefits of 
the Vectorized Reader, which makes jobs a lot more efficient. I am also 
encountering this original issue and while I can bypass it by changing my sql 
joins to use "ON" instead of "USING" and disabling the vectorized reader, there 
is still a fundamental problem with the vectorized reader.
   
   The Iceberg Vectorized Reader seemingly does not support the Athena's delta 
encoding. However, it looks like it is supported in Spark as of January of 
2022: https://github.com/apache/spark/pull/35262
   
   This would solve one of the underlying problems that this issue brings up. 
Can you re-open this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] anthonysgro commented on issue #5593: Reading records inserted using Athena throws UOE exception when read using Spark (AWS)

Reply via email to