mapdata-solutions commented on issue #13013:
URL: https://github.com/apache/hudi/issues/13013#issuecomment-2753314036

   Hi,
   
   I'm trying to delete records from a Hudi table with multi-level Hive-style 
partitioning. Here's my table structure and configuration:
   
   Table Schema:
   ```sql
   root
    |-- _hoodie_commit_time: string (nullable = true)
    |-- _hoodie_commit_seqno: string (nullable = true)
    |-- _hoodie_record_key: string (nullable = true)
    |-- _hoodie_partition_path: string (nullable = true)
    |-- _hoodie_file_name: string (nullable = true)
    |-- id: decimal(38,0) (nullable = true)
    |-- commit_ts: timestamp (nullable = true)
    |-- year: string (nullable = true)
    |-- month: string (nullable = true)
   
   Sample Data:
   +-------+-------------------+----+-----+
   |     id|    commit_ts      |year|month|
   +-------+-------------------+----+-----+
   |    561|2025-03-12 13:57:08|2025|03   |
   |    558|2025-03-12 13:56:31|2025|03   |
   |    563|2025-03-12 13:57:31|2025|03   |
   |    565|2025-03-12 13:57:57|2025|03   |
   |    559|2025-03-12 13:56:45|2025|03   |
   +-------+-------------------+----+-----+
   My table configuration:
   
   hudi_options = {
       'className': 'org.apache.hudi',
       'hoodie.table.name': table_name,
       'hoodie.datasource.write.recordkey.field': 'id',
       'hoodie.datasource.write.precombine.field': 'commit_ts',
       'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.CustomKeyGenerator',
       'hoodie.datasource.write.partitionpath.field': 
'year:SIMPLE,month:SIMPLE',
       'hoodie.datasource.write.hive_style_partitioning': 'true',
       'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor',
       'hoodie.datasource.hive_sync.partition_fields': 'year,month',
       'hoodie.datasource.write.payload.class': 
'org.apache.hudi.common.model.EmptyHoodieRecordPayload'
   }
   I noticed in issue #13013 that deletes work with partitionpath, but in my 
case, I'm using Hive-style partitioning with year and month fields. Could you 
please advise:
   
       1. What is the correct way to perform deletes with this multi-level 
partitioning setup?
   
       2. Should I modify the CustomKeyGenerator configuration for delete 
operations?
   
       3. Are there any specific configurations needed for delete operations 
with Hive-style partitioning?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to