wombatu-kun commented on issue #13219: URL: https://github.com/apache/hudi/issues/13219#issuecomment-2834787926
From documentation: > The MERGE INTO statement is similar to the UPDATE statement, but it allows you to specify different actions for matched and unmatched records. Action `matched`: should be performed on records that already exist in both target table and source table. Action `not matched`: should be performed on records that does not exist in target table. If target table is COW - it works exactly like written above. But for target MOR, in fact, it doesn't really check the existence of records in target table while deciding which action to execute: match or not match. This decision making is based on the presence of `currentLocation` field which is set by HoodieIndex.tagLocation() procedure. For example, if we use Bucket Simple index with bucket number = 1 and run merge into command, the `tagLocation` produces the same `currentLocation` for all source records (no matter exist that records in target table or not), so only `matched` action will be executed for all source records. We'll have incorrect results because non-existent records should be produced by `not matched` action. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
