[ 
https://issues.apache.org/jira/browse/HUDI-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenning Ding updated HUDI-4278:
-------------------------------
    Description: 
The issue is each time when Hudi upserts records, it would sync to the catalog 
and update {{last_commit_time_sync}} for the Glue table. Each time it updates 
this property, Glue by default would create a new table version and archive old 
versions. So the problem is if customers update the Hudi table frequently, 
eventually they would hit the Glue table version limit.

So here inside Hudi, we pass a parameter {{skipGlueArchive}} to the environment 
context to finally pass it to {{{}AWS Glue metadata service{}}}, so Glue client 
has an option to decide whether to skip archive or not.

> Add skip archive option when syncing to AWS Glue tables
> -------------------------------------------------------
>
>                 Key: HUDI-4278
>                 URL: https://issues.apache.org/jira/browse/HUDI-4278
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Wenning Ding
>            Priority: Major
>
> The issue is each time when Hudi upserts records, it would sync to the 
> catalog and update {{last_commit_time_sync}} for the Glue table. Each time it 
> updates this property, Glue by default would create a new table version and 
> archive old versions. So the problem is if customers update the Hudi table 
> frequently, eventually they would hit the Glue table version limit.
> So here inside Hudi, we pass a parameter {{skipGlueArchive}} to the 
> environment context to finally pass it to {{{}AWS Glue metadata service{}}}, 
> so Glue client has an option to decide whether to skip archive or not.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to