mapdata-solutions opened a new issue, #13013:
URL: https://github.com/apache/hudi/issues/13013

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   Description:
   Hard delete operations are failing specifically for Hudi tables that have a 
single record key with partitioning, while working correctly for other table 
configurations.
   
   Configurations Tested:
   1. Single Record Key, No Partitions - WORKING
   2. Single Record Key, With Partitions - NOT WORKING
   3. Multiple Record Keys, No Partitions - WORKING
   4. Multiple Record Keys, With Partitions - WORKING
   
   Configuration Used in Delete Operation:
   ```python
   write_conf = {
       'hoodie.table.name': table_name,
       'hoodie.datasource.write.recordkey.field': '_id',  # Single record key
       'hoodie.datasource.write.precombine.field': precombine_key,
       'hoodie.datasource.write.operation': 'delete',
       'hoodie.populate.meta.fields': 'true',
       'hoodie.cleaner.policy': 'KEEP_LATEST_COMMITS',
       'hoodie.cleaner.commits.retained': '1',
       'hoodie.metadata.ignore.spurious.deletes': 'false',
       'hoodie.bloom.index.prune.by.ranges': 'false',
       'hoodie.datasource.write.payload.class': 
'org.apache.hudi.common.model.EmptyHoodieRecordPayload'
   }
   
   # Partition Configuration
   partition_conf = {
       'hoodie.datasource.write.partitionpath.field': 'year,month',
       'hoodie.datasource.write.hive_style_partitioning': 'true',
       'hoodie.table.keygenerator.class': 
'org.apache.hudi.keygen.CustomKeyGenerator'
   }
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create Hudi tables with the following configurations: a. Single record 
key without partitions b. Single record key with partitions (year, month) c. 
Multiple record keys without partitions d. Multiple record keys with partitions
   
   2. Insert sample data into all tables
   
   3. Attempt hard delete operation using the following code:
   delete_df.write.format("org.apache.hudi") \
       .options(**hudi_options) \
       .mode("append") \
       .save(table_path)
   
   Expected Behavior:
         Records should be deleted from all table configurations when 
performing hard delete operation
   Actual Behavior:
             Delete operation works for: 
             Single record key without partitions
             Multiple record keys without partitions
             Multiple record keys with partitions
   
   Delete operation fails for:
           Single record key with partitions
   Impact:
   This bug affects data deletion operations in partitioned tables with single 
record keys, which is a common use case in data lakes. It prevents proper data 
cleanup and management of these tables.
   
   Additional Notes:
   
             The delete operation doesn't throw an error but fails to remove 
the records
             The schema and partition information are correctly maintained in 
the delete_df
             The same configuration works for other table types
             CustomKeyGenerator is being used as per table creation 
configuration
   Possible Investigation Areas:
             Interaction between CustomKeyGenerator and single record key in 
partitioned tables
             Partition path handling during delete operations
             Record key to partition mapping in delete scenarios
   Workaround:
             Currently, no known workaround has been identified for this 
specific configuration.
             
         
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :  0.12.1
   
   * Spark version : 3.3
   
   * Hive version : 3.1.3
   
   * Hadoop version : 3.3.3
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to