furquan1993 opened a new issue, #14363:
URL: https://github.com/apache/hudi/issues/14363

   ### Bug Description
   
   **What happened:**
   There is hudi metadata missing from the records in alternate commit.
   I can see that when I am using `hoodie.datasource.write.operation` as 
`upsert`, then alternate commit results in missing Hudi metadata fields in the 
after column, but for `insert_overwrite/bulk_insert` this is working fine. 
   
   
   **What you expected:**
   I expect to see all records containing Hoodie metadata fields e.g. record key
   
   **Steps to reproduce:**
   1. Create a new Hudi table with cdc enabled and write with following 
properties
   ```
   upsertHudiOptions.put("hoodie.table.name", tableName);
           upsertHudiOptions.put("hoodie.datasource.write.table.type", 
"MERGE_ON_READ");
           upsertHudiOptions.put("hoodie.datasource.write.operation", "upsert");
           upsertHudiOptions.put("hoodie.datasource.write.recordkey.field", 
"order_id");
           upsertHudiOptions.put("hoodie.datasource.write.precombine.field", 
"order_date");
           upsertHudiOptions.put(
               "hoodie.datasource.write.keygenerator.class",
               "org.apache.hudi.keygen.NonpartitionedKeyGenerator"
           );
           upsertHudiOptions.put("hoodie.table.cdc.enabled", "true");
           upsertHudiOptions.put("hoodie.table.cdc.supplemental.logging.mode", 
"DATA_BEFORE_AFTER");
           upsertHudiOptions.put("hoodie.compact.inline", "true");
           upsertHudiOptions.put("hoodie.compact.inline.max.delta.commits", 
"1");
   ```
   2. Insert few records using multiple commits (more than 3)
   3. Do an incremental CDC query with following read options.
   ```
   spark
               .read()
               .format("hudi")
               .option("hoodie.datasource.query.type", "incremental")
               .option("hoodie.datasource.query.incremental.format", "cdc")
               .option("hoodie.datasource.read.begin.instanttime", "0")
               .load(tablePath);
   ```
   4. You will find the result is missing hoodie metadata for alternate commits.
   ```
   
+---+-----------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   |op |ts_ms            |before|after                                          
                                                                                
                                                                                
                                                                                
                                                                     |
   
+---+-----------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   |i  |20251125122540173|NULL  
|{"_hoodie_commit_time":"20251125122540173","_hoodie_commit_seqno":"20251125122540173_0_0","_hoodie_record_key":"0","_hoodie_partition_path":"","_hoodie_file_name":"a134a609-4418-427a-9a3e-cbfca61f2912-0_0-26-22_20251125122540173.parquet","order_id":0,"order_customer_id":101,"order_status":"PENDING","order_date":19723,"policy_types":["AUTO","HOME"]}
      |
   |i  |20251125122548634|null  |{"order_id": 1, "order_customer_id": 101, 
"order_status": "PENDING", "order_date": 19723, "policy_types": ["AUTO", 
"HOME"]}                                                                        
                                                                                
                                                                                
 |
   |i  |20251125122548634|null  |{"order_id": 2, "order_customer_id": 102, 
"order_status": "PENDING", "order_date": 19724, "policy_types": ["LIFE", 
"HEALTH"]}                                                                      
                                                                                
                                                                                
 |
   |i  |20251125122548634|null  |{"order_id": 3, "order_customer_id": 103, 
"order_status": "SHIPPED", "order_date": 19725, "policy_types": ["AUTO"]}       
                                                                                
                                                                                
                                                                          |
   |i  |20251125122548634|null  |{"order_id": 4, "order_customer_id": 104, 
"order_status": "CANCELLED", "order_date": 19726, "policy_types": ["TRAVEL", 
"LIFE"]}                                                                        
                                                                                
                                                                             |
   |i  |20251125122548634|null  |{"order_id": 5, "order_customer_id": 105, 
"order_status": "PENDING", "order_date": 19727, "policy_types": ["HOME"]}       
                                                                                
                                                                                
                                                                          |
   |i  |20251125122548634|null  |{"order_id": 6, "order_customer_id": 106, 
"order_status": "DELIVERED", "order_date": 19728, "policy_types": ["HEALTH", 
"AUTO"]}                                                                        
                                                                                
                                                                             |
   |i  |20251125122548634|null  |{"order_id": 7, "order_customer_id": 107, 
"order_status": "PENDING", "order_date": 19729, "policy_types": ["TRAVEL"]}     
                                                                                
                                                                                
                                                                          |
   |i  |20251125122548634|null  |{"order_id": 8, "order_customer_id": 108, 
"order_status": "SHIPPED", "order_date": 19730, "policy_types": ["AUTO", 
"LIFE"]}                                                                        
                                                                                
                                                                                
 |
   |i  |20251125122548634|null  |{"order_id": 9, "order_customer_id": 109, 
"order_status": "CANCELLED", "order_date": 19731, "policy_types": ["HEALTH"]}   
                                                                                
                                                                                
                                                                          |
   |i  |20251125122548634|null  |{"order_id": 10, "order_customer_id": 110, 
"order_status": "DELIVERED", "order_date": 19732, "policy_types": ["HOME", 
"TRAVEL"]}                                                                      
                                                                                
                                                                              |
   |i  |20251125122554330|NULL  
|{"_hoodie_commit_time":"20251125122554330","_hoodie_commit_seqno":"20251125122554330_0_0","_hoodie_record_key":"11","_hoodie_partition_path":"","_hoodie_file_name":"e49b926a-7526-4cc8-b5c9-52c40a392e53-0_0-85-97_20251125122554330.parquet","order_id":11,"order_customer_id":111,"order_status":"PENDING","order_date":19733,"policy_types":["LIFE","TRAVEL"]}
  |
   |i  |20251125122556999|null  |{"order_id": 12, "order_customer_id": 111, 
"order_status": "COMPLETED", "order_date": 19739, "policy_types": ["MOTOR", 
"TRAVEL"]}                                                                      
                                                                                
                                                                             |
   |i  |20251125122559700|NULL  
|{"_hoodie_commit_time":"20251125122559700","_hoodie_commit_seqno":"20251125122559700_0_0","_hoodie_record_key":"13","_hoodie_partition_path":"","_hoodie_file_name":"57d7fbf7-bd40-41a9-ad29-7dac8bffc362-0_0-143-186_20251125122559700.parquet","order_id":13,"order_customer_id":101,"order_status":"COMPLETED","order_date":19741,"policy_types":["MOTOR","CAR"]}|
   |i  |20251125122602592|null  |{"order_id": 14, "order_customer_id": 102, 
"order_status": "COMPLETED", "order_date": 19742, "policy_types": ["TRAVEL", 
"CAR"]}                                                                         
                                                                                
                                                                            |
   
+---+-----------------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   
   ```
    
   Attaching my code to reproduce the issue.
   [Main.java](https://github.com/user-attachments/files/23762656/Main.java)
   
   ### Environment
   
   **Hudi version:**
   All version - 1.02/1.1.0/1.2.0-SNAPSHOT
   
   **Query engine:** (Spark/Flink/Trino etc)
   Tried using spark
   
   **Relevant configs:**
   find code below
   [Main.java](https://github.com/user-attachments/files/23762671/Main.java)
   
   ### Logs and Stack Trace
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to