[ 
https://issues.apache.org/jira/browse/HUDI-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udit Mehrotra updated HUDI-5603:
--------------------------------
    Description: 
Currently, when a user tries to drop a partition using spark sql 
[https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-alter-table.html#drop-partition]
 , and then perform a rollback on this dropped partition, they do not see this 
partition present when running  *SHOW PARTITIONS* command. The reason is that 
as part of drop partition operation, Hudi also deletes the partition from table 
metadata. However, rolling it back does not add the partition back to Hudi 
table metadata. Hence, *SHOW PARTITIONS* does not return the rolled back 
partition.

 

As part of drop partition command, Hudi will schedule a clean operation of this 
partition data treating this a HARD delete. However, it is possible that user 
rollsback the drop partition commit by the time the cleaner is run (or may be 
user turns off the cleaner). In such scenarios, even though the data is rolled 
back, the partition still does not appear in the table metadata leaving the 
Hudi table in a corrupt state.

 

We think we can enhance this functionality to support rollback for drop 
partitions. If we decide against it, then we should disallow rolling back of 
commits that drop partition so users don't end up in this state.

 

  was:
Currently it seems that when a user tries to drop a partition using spark sql 
[https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-alter-table.html#drop-partition]
 , and then perform a rollback on this dropped partition, they do not see this 
partition present when running show partitions command. 

 

The drop partition command will schedule at some point of a clean operation of 
this partition data ( unless cleaner is off) making this a HARD delete, in this 
case we should add some rollback function for the replace commit thats created 
as long its in the retention period. 

 


> Support Rollback for Dropped Parititions 
> -----------------------------------------
>
>                 Key: HUDI-5603
>                 URL: https://issues.apache.org/jira/browse/HUDI-5603
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Rahil Chertara
>            Priority: Major
>
> Currently, when a user tries to drop a partition using spark sql 
> [https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-alter-table.html#drop-partition]
>  , and then perform a rollback on this dropped partition, they do not see 
> this partition present when running  *SHOW PARTITIONS* command. The reason is 
> that as part of drop partition operation, Hudi also deletes the partition 
> from table metadata. However, rolling it back does not add the partition back 
> to Hudi table metadata. Hence, *SHOW PARTITIONS* does not return the rolled 
> back partition.
>  
> As part of drop partition command, Hudi will schedule a clean operation of 
> this partition data treating this a HARD delete. However, it is possible that 
> user rollsback the drop partition commit by the time the cleaner is run (or 
> may be user turns off the cleaner). In such scenarios, even though the data 
> is rolled back, the partition still does not appear in the table metadata 
> leaving the Hudi table in a corrupt state.
>  
> We think we can enhance this functionality to support rollback for drop 
> partitions. If we decide against it, then we should disallow rolling back of 
> commits that drop partition so users don't end up in this state.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to