[
https://issues.apache.org/jira/browse/HUDI-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739417#comment-17739417
]
Shawn Chang commented on HUDI-6453:
-----------------------------------
I've attached `schema_reproduce.scala` script for reproduction.
Run this script with Spark to create a table on glue and evolute its schema,
and then try query the table. Should be able to reproduce
`HIVE_PARTITION_SCHEMA_MISMATCH` issue described
here:https://docs.aws.amazon.com/athena/latest/ug/updates-and-partitions.html
> Cascade schema changes to partition level
> -----------------------------------------
>
> Key: HUDI-6453
> URL: https://issues.apache.org/jira/browse/HUDI-6453
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Shawn Chang
> Assignee: Shawn Chang
> Priority: Major
> Labels: pull-request-available
> Attachments: schema_reproduce.scala
>
>
> https://github.com/apache/hudi/blob/a07549fd13bae6a74d9b82956e10520c55de3c63/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java#L333
> Currently Glue sync tool in Hudi won't cascade partition-level changes to
> glue .
> This would cause schema mismatch issue when user altered their table schema
> in a newer partition and then query older partitions with engine that are
> aware of partition-level schema like Athena.
> Glue doesn't provide a convenient API for cascading so we have to update Glue
> partitions directly
--
This message was sent by Atlassian Jira
(v8.20.10#820010)