nsivabalan commented on issue #1941: URL: https://github.com/apache/hudi/issues/1941#issuecomment-685800288
this is expected for now. I will let @n3nash or @bvaradar to take a call on how to go about this. But I can explain whats happening. Lets say you insert a row to hudi. (uses simple key gen of col1 as row key and col2 as partition path) // col1 , col2, col3 row_key1, pp_1, data_1 When its inserted, hudi appends meta columns // hudi_rowkey, hudi_pp, hudi_fileId, hudi_commit_seq, hudi_commit_time, col1, col2,col3 row_key1, pp_1, fileId1, abc, def, row_key1, pp_1, data_1 With any global index, if you upsert with a different partition path compared to whats in storage and if the config (update partition path) is set to false(default), record will be upserted to original partition path. Record being upserted: row_key1, pp_2, data_2 since pp_2 is different from pp1, this record will be upserted to pp_1. hence row_key1, pp_1, fileId1, abc, def, row_key1, pp_2, data_1 Notice that all data columns are same as passed in (especially pp_2) and only meta columns are fixed to be updated to pp_1. But I wonder we might have the same issue w/ any global index. I need to investigate on this further. In the mean time, @simonqin : if you want the record to go into pp_2, try applying this patch and you should see record getting upserted to pp_2. https://github.com/apache/hudi/pull/1978 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
