[GitHub] [hudi] nsivabalan commented on issue #4778: [SUPPORT] Row with _hoodie_is_deleted=True stored into target table

GitBox Mon, 14 Feb 2022 12:23:32 -0800


nsivabalan commented on issue #4778:
URL: https://github.com/apache/hudi/issues/4778#issuecomment-1039519922



   yes. unfortunately, this behavior can't be fixed easily. 
   For eg, 
   1. insert rec1 at time t0
   2. delete rec1 at time t100
   3. again re-insert rec1 at time t100000
   
   lets say between (2) and (3), there were 100s of commits, hudi may not keep 
remembering every record it has ever seen. So, after (2), whenever next update 
happens, hudi will remove rec1 from its storage. and so later if rec1 is 
ingested again, hudi will consider it as an insert record. 
   
   If not, hudi has to keep track of every record that got inserted and deleted 
forever. 
   Dont' think makes sense for a large analytical storage system. 
   
   
   wrt your statement "But the target table will contain a row with the values 
from the delete, whereas this row should not be inserted into the target table 
in any way (same as above)." : the delete record will be ingested to hudi, but 
will have "_hoodie_is_deleted" set to true. But during next merge or 
compaction, the record will be removed. This record will be part of storage 
only for a short duration. Also, there are other ways to trigger deletes. 
Please check out the details 
[here](https://hudi.apache.org/blog/2020/01/15/delete-support-in-hudi/).  not 
all of them storage the value with _hoodie_is_deleted as true. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on issue #4778: [SUPPORT] Row with _hoodie_is_deleted=True stored into target table

Reply via email to