wombatu-kun commented on issue #13219:
URL: https://github.com/apache/hudi/issues/13219#issuecomment-2834016064

   So here are results for this test with different types of tables and indexes:
   ```
   correct:
   COW any idx:  [3,130,aa3], [5,151,oo], [6,161,oo], [7,170,aa7]
   MOR INMEMORY: [3,130,aa3], [5,151,oo], [6,161,oo], [7,170,aa7]
   incorrect:
   MOR BUCKET 1: [3,131,oo], [5,151,oo], [6,161,oo], [7,171,oo]
   MOR BUCKET 5: [3,131,oo], [5,151,oo], [6,161,oo], [7,170,aa7]
   MOR SIMPLE:   [3,131,oo], [5,151,oo], [6,161,oo], [7,170,aa7]
   MOR BLOOM:    [3,131,oo], [5,151,oo], [6,161,oo], [7,170,aa7]
   ```
   As i understand it right: MERGE INTO logic decides about update/insert of 
every record by it's currentLocation, so, if HoodieIndex has somehow calculated 
currentLocation for a record, this record would go through "when matched" 
execution branch, no matter whether this record really exists or not. This 
logic is ok for common upsert operation, but as you can see, it's not suitable 
in case of merge into.  
   
   @danny0405 @yihua @nsivabalan @jonvex @codope @vinothchandar how do you 
think, what is the best way to fix it?  
   Please, give me some advice, what is the proper place in code to change this 
logic: indexes, sprksqlwriter, MergeInto command, somewhere else?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to