mapdata-solutions opened a new issue, #13013: URL: https://github.com/apache/hudi/issues/13013
**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. Description: Hard delete operations are failing specifically for Hudi tables that have a single record key with partitioning, while working correctly for other table configurations. Configurations Tested: 1. Single Record Key, No Partitions - WORKING 2. Single Record Key, With Partitions - NOT WORKING 3. Multiple Record Keys, No Partitions - WORKING 4. Multiple Record Keys, With Partitions - WORKING Configuration Used in Delete Operation: ```python write_conf = { 'hoodie.table.name': table_name, 'hoodie.datasource.write.recordkey.field': '_id', # Single record key 'hoodie.datasource.write.precombine.field': precombine_key, 'hoodie.datasource.write.operation': 'delete', 'hoodie.populate.meta.fields': 'true', 'hoodie.cleaner.policy': 'KEEP_LATEST_COMMITS', 'hoodie.cleaner.commits.retained': '1', 'hoodie.metadata.ignore.spurious.deletes': 'false', 'hoodie.bloom.index.prune.by.ranges': 'false', 'hoodie.datasource.write.payload.class': 'org.apache.hudi.common.model.EmptyHoodieRecordPayload' } # Partition Configuration partition_conf = { 'hoodie.datasource.write.partitionpath.field': 'year,month', 'hoodie.datasource.write.hive_style_partitioning': 'true', 'hoodie.table.keygenerator.class': 'org.apache.hudi.keygen.CustomKeyGenerator' } A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1. Create Hudi tables with the following configurations: a. Single record key without partitions b. Single record key with partitions (year, month) c. Multiple record keys without partitions d. Multiple record keys with partitions 2. Insert sample data into all tables 3. Attempt hard delete operation using the following code: delete_df.write.format("org.apache.hudi") \ .options(**hudi_options) \ .mode("append") \ .save(table_path) Expected Behavior: Records should be deleted from all table configurations when performing hard delete operation Actual Behavior: Delete operation works for: Single record key without partitions Multiple record keys without partitions Multiple record keys with partitions Delete operation fails for: Single record key with partitions Impact: This bug affects data deletion operations in partitioned tables with single record keys, which is a common use case in data lakes. It prevents proper data cleanup and management of these tables. Additional Notes: The delete operation doesn't throw an error but fails to remove the records The schema and partition information are correctly maintained in the delete_df The same configuration works for other table types CustomKeyGenerator is being used as per table creation configuration Possible Investigation Areas: Interaction between CustomKeyGenerator and single record key in partitioned tables Partition path handling during delete operations Record key to partition mapping in delete scenarios Workaround: Currently, no known workaround has been identified for this specific configuration. A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : 0.12.1 * Spark version : 3.3 * Hive version : 3.1.3 * Hadoop version : 3.3.3 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : No **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
