nsivabalan commented on issue #1941:
URL: https://github.com/apache/hudi/issues/1941#issuecomment-685800288


   this is expected for now. I will let @n3nash or @bvaradar to take a call on 
how to go about this. But I can explain whats happening. 
   
   Lets say you insert a row to hudi. (uses simple key gen of col1 as row key 
and col2 as partition path)
   // col1 , col2, col3
   row_key1, pp_1, data_1
   When its inserted, hudi appends meta columns 
   // hudi_rowkey, hudi_pp, hudi_fileId, hudi_commit_seq, hudi_commit_time, 
col1, col2,col3
   row_key1, pp_1, fileId1, abc, def, row_key1, pp_1, data_1
   
   With any global index, if you upsert with a different partition path 
compared to whats in storage and if the config (update partition path) is set 
to false(default), record will be upserted to original partition path. 
   Record being upserted: 
   row_key1, pp_2, data_2
   
   since pp_2 is different from pp1, this record will be upserted to pp_1. 
   hence 
   row_key1, pp_1, fileId1, abc, def, row_key1, pp_2, data_1
   
   Notice that all data columns are same as passed in (especially pp_2) and 
only meta columns are fixed to be updated to pp_1. 
   But I wonder we might have the same issue w/ any global index. I need to 
investigate on this further. 
   
   In the mean time, @simonqin : if you want the record to go into pp_2, try 
applying this patch and you should see record getting upserted to pp_2. 
https://github.com/apache/hudi/pull/1978
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to