maduraitech opened a new issue, #6503:
URL: https://github.com/apache/hudi/issues/6503

   Use case: We are trying to perform merge into for update partial columns, 
else insert new records in single command.
   
   Issue: Data is not updating as expected rather it’s trying to insert the 
record which is already existing and creating duplicates. 
   Also its updating for few rows.
   When we retry the same merge into statement with same data again, it's 
always inserting new rows and for specific rows it's keep on updating every run.
   
   **Environment Description:
   Hudi:  0.11.0
   Spark: 2.4.8
   Storage: GCS**
   
   More Details: 
   When we tried similar use case for small tables, it's working fine.
   **We do have the following additional options:**
   Added below hudi write configs while creating table to see, we don't see 
much difference but rather its not even updating the column which was updating 
previously for few rows.
    
   Options (
   hoodie.datasource.write.table.type='COPY_ON_WRITE',
   primaryKey = 'col1,col2 etc.',
   hoodie.datasource.write.hive_style_partitioning = false,
   hoodie.datasource.write.operation = 'upsert',
   hoodie.datasource.write.payload.class = 
'org.apache.hudi.common.model.DefaultHoodieRecordPayload',
   hoodie.datasource.write.keygenerator.class = 
'org.apache.hudi.keygen.ComplexKeyGenerator'
   )
   Addition we also tried is to check if we can combine all our keys into one 
(if too many key columns was concern) and perform merge. Even this scenario has 
no difference in behaviour.
   
   **Please note,** Reason we don’t want to precombineField at table level was 
as it will enforce to include in our update which we don’t want as part of the 
use case behavior. For lower volume which we have tested , we didn’t have  
precombineField at table level. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to