[ 
https://issues.apache.org/jira/browse/HIVE-25293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18033487#comment-18033487
 ] 

Patrick Duin edited comment on HIVE-25293 at 10/28/25 9:26 AM:
---------------------------------------------------------------

Hi (user here not a Hive dev)  just sharing in case this helps. We had similar 
issue (tables with 50k - 100k+ partitions) and due to lots of reasons found it 
hard to upgrade to newer version of Hive. 
One workaround is to drop the table and recreate it from scratch e.g. create 
table and run all add partitions (you can do this as a temp table then rename). 
That should clean up the underlying rds data. 

Note: Dropping the table can be hard you might need to do that by deleting 
partitions in batches.
A long term solution for us is to move to Iceberg tables as you don't have 
partitions in the Hive metastore anymore.


was (Author: patduin):
Hi (user here not a Hive dev)  just sharing in case this helps. We had similar 
issue (tables with 50k - 100k partitions) and due to lots of reasons found it 
hard to upgrade to newer version of Hive. 
One workaround is to drop the table and recreate it from scratch e.g. create 
table and run all add partitions (you can do this as a temp table then rename). 
That should clean up the underlying rds data. 

Note: Dropping the table can be hard you might need to do that by deleting 
partitions in batches.
A long term solution for us is to move to Iceberg tables as you don't have 
partitions in the Hive metastore anymore.

> Alter partitioned table with "cascade" option create too many columns records.
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-25293
>                 URL: https://issues.apache.org/jira/browse/HIVE-25293
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 2.3.3, 3.1.2
>            Reporter: yongtaoliao
>            Assignee: yongtaoliao
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When alter partitioned table with "cascade" option, all partitions supports 
> to be updated. Currently, a CD_ID will be created for each partition, 
> associated with a set of Columns, which will cause a large amount of 
> redundant data in the metadata database.
> The following DDL statements can reproduce this scenario:
>  
> {code:java}
> create table test_table (f1 int) partitioned by (p string);
> alter table test_table add partition(p='a');
> alter table test_table add partition(p='b');
> alter table test_table add partition(p='c');
> alter table test_table add columns (f2 int) cascade;{code}
> All partitions use the table's `CD_ID` before adding columns, while each 
> partition use their own `CD_ID` after adding columns.
>  
> My proposal is all partitions should use the same `CD_ID` when table was 
> altered with "cascade" option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to