[
https://issues.apache.org/jira/browse/HUDI-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinoth Chandar updated HUDI-1127:
---------------------------------
Sprint: Hudi-Sprint-Jan-24, Hudi-Sprint-Jan-31, Hudi-Sprint-Feb-7,
Hudi-Sprint-Feb-14 (was: Hudi-Sprint-Jan-24, Hudi-Sprint-Jan-31,
Hudi-Sprint-Feb-7)
> Handling late arriving Deletes
> ------------------------------
>
> Key: HUDI-1127
> URL: https://issues.apache.org/jira/browse/HUDI-1127
> Project: Apache Hudi
> Issue Type: Improvement
> Components: deltastreamer, writer-core
> Affects Versions: 0.9.0
> Reporter: Bhavani Sudha
> Assignee: Alexey Kudinkin
> Priority: Blocker
> Labels: sev:high
> Fix For: 0.11.0
>
>
> Recently I was working on a [PR|https://github.com/apache/hudi/pull/1704] to
> enhance OverwriteWithLatestAvroPayload class to consider records in storage
> when merging. Briefly, this class will ignore older updates if the record in
> storage is the latest one ( based on the Precombine field).
> Based on this, the expectation is that we handle any write operation that
> should be dealt with the same way - if they are older they should be ignored.
> While at this, I identified that we cannot handle all Deletes the same way.
> This is because we process deletes in two ways mainly -
> * by adding and enabling a metadata field `_hoodie_is_deleted` to our in
> the original record and sending it as an UPSERT operation.
> * by using an empty payload using the EmptyHoodieRecordPayload and sending
> the write as a DELETE operation.
> While the former has ordering field and can be processed as expected (older
> deletes will be ignored), the later does not have any ordering field to
> identify if its an older delete or not and hence will let the older delete to
> go through.
> Just opening this issue to track this gap. We would need to identify what is
> the right choice here and fix as needed.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)