wombatu-kun commented on issue #13219:
URL: https://github.com/apache/hudi/issues/13219#issuecomment-2871261601

   > One way is we check the `HoodieOperation` of the record, but first of all, 
it needs to be set up correctly, and it looks like the issue is only related 
with `DELETES`, I guess maybe we just need to distinguish the deletes by 
`HoodieRecord.isDelete`.
   
   May be i don't understand you right, but i disagree about "the issue is only 
related with DELETES". I think, the issue relates to the making decision which 
action to execute on each incoming record (match or not_match).
   
   Let me clarify the case above:
   1. at the beginning, we have target table of type MOR with records: 
(3,130,'aa3'), (5,150,'aa5'), (6,160,'aa6')
   2. then we deleted record with id=3, so we have in target: (5,150,'aa5'), 
(6,160,'aa6')
   3. source table contains these records: (3,130,'aa3'), (5,150,'aa5'), 
(6,160,'aa6'), (7,170,'aa7')
   4. run merge into target from source: when matched then update with 
modification, when not matched then insert as is. So target table should 
become: (3,130,'aa3'), (5,151,'oo'), (6,161,'oo'), (7,170,'aa7')
   id=3 should be inserted as is (3,130,'aa3') (as it was previously deleted 
from target)
   id=7 should be inserted as is (7,170,'aa7') (as it is new to target)
   id=5 should be updated (150->151, aa5 -> oo)
   id=5 should be updated (160->161, aa6 -> oo)
   
   It works like this with COW (any index) and MOR (INMEMORY index only).
   For BUCKET with bucket number=1: all records are updated (because 
currentLocation always tagged for all records).
   For other indexes: 3,5,6 updated, 7 inserted (previously deleted record 3 is 
taken as "matched").
   
   Its because it doesn't check the existence of records in target table when 
deciding which action to execute (match or not_match), it just looks at 
currentLocation in HoodieRecord.
   And as you can see, for MOR table and any type of index (except INMEMORY) 
presence of currentLocation in HoodieRecord - should not be the sign of record 
existence.
   If we somehow set up the `HoodieOperation` of the record correctly (really 
check the existence of records), we slow down performance significantly 
(especially for BUCKET index). If not - we have MERGE INTO working with MOR 
tables incorrectly.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to