[ 
https://issues.apache.org/jira/browse/HUDI-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739417#comment-17739417
 ] 

Shawn Chang commented on HUDI-6453:
-----------------------------------

I've attached `schema_reproduce.scala` script for reproduction.

 

Run this script with Spark to create a table on glue and evolute its schema, 
and then try query the table. Should be able to reproduce 
`HIVE_PARTITION_SCHEMA_MISMATCH` issue described 
here:https://docs.aws.amazon.com/athena/latest/ug/updates-and-partitions.html 

> Cascade schema changes to partition level
> -----------------------------------------
>
>                 Key: HUDI-6453
>                 URL: https://issues.apache.org/jira/browse/HUDI-6453
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Shawn Chang
>            Assignee: Shawn Chang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: schema_reproduce.scala
>
>
> https://github.com/apache/hudi/blob/a07549fd13bae6a74d9b82956e10520c55de3c63/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java#L333
> Currently Glue sync tool in Hudi won't cascade partition-level changes to 
> glue .
> This would cause schema mismatch issue when user altered their table schema 
> in a newer partition and then query older partitions with engine that are 
> aware of partition-level schema like Athena.
> Glue doesn't provide a convenient API for cascading so we have to update Glue 
> partitions directly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to