kaleshkk opened a new issue, #8918:
URL: https://github.com/apache/hudi/issues/8918

   **Problem Description:**
   I am using Apache Hudi's Copy-on-Write (CoW) table and have implemented a 
data pipeline to remove complete partitions from the table. However, I have 
noticed that the physical files associated with the deleted partitions are not 
being removed.
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   `hudiOptions = {
       'hoodie.table.name': "table_name",
       'hoodie.datasource.write.recordkey.field': "hudi_key",
       'hoodie.datasource.write.table.name': "table_name",
       'hoodie.datasource.write.precombine.field': "ts",
       'hoodie.datasource.write.partitionpath.field': 'run_date',
       'hoodie.datasource.write.hive_style_partitioning': "true",
       'hoodie.datasource.write.drop.partition.columns':'true',
       'hoodie.datasource.write.operation': "delete",
       'hoodie.datasource.write.table.type': "COPY_ON_WRITE",
       'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.ComplexKeyGenerator'
   }
   
   removablePartions = ['20230608051260', '20230609043200']
   deleteDataFrame = 
refinedDataFrame.filter(col("run_date").isin(removablePartions))
   
   if deleteDataFrame.first() is not None:
       deleteDataFrame.write \
           .format("org.apache.hudi") \
           .options(**hudiOptions) \
           .mode("append") \
           .save(f"s3://{s3Bucket}/{s3Prefix}")`
   **Expected behavior**
   
   I expect that when executing the code, the specified partitions would be 
removed from the Hudi table, including the associated physical files.
   
   **Environment Description**
   
   * Hudi version : 0.12.1
   
   * Spark version : 3.3
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to