anthonysgro commented on issue #5593: URL: https://github.com/apache/iceberg/issues/5593#issuecomment-1478386917
@a-agmon `read.parquet.vectorization.enabled` is not a supported field through athena: https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-creating-tables.html#querying-iceberg-table-properties you have to set it either on the table level through something like a jupyter notebook, OR you can include this spark conf on your job: `.config("spark.sql.iceberg.vectorization.enabled", "false")` However, I don't like this workaround because you lose all the benefits of the Vectorized Reader, which makes jobs a lot more efficient. I am also encountering this original issue and while I can bypass it by changing my sql joins to use "ON" instead of "USING" and disabling the vectorized reader, there is still a fundamental problem with the vectorized reader. The Iceberg Vectorized Reader seemingly does not support the Athena's delta encoding. However, it looks like it is supported in Spark as of January of 2022: https://github.com/apache/spark/pull/35262 This would solve one of the underlying problems that this issue brings up. Can you re-open this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
