[
https://issues.apache.org/jira/browse/HUDI-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-6199:
----------------------------
Description:
Delete operation in custom payload after RFC-46: while looking into a 0.13.1
release [blocker|https://github.com/apache/hudi/pull/8573], I found that custom
payload implementation like AWS DMS payload and Debezium payload are not
properly migrated to the new APIs introduced by RFC-46, causing the delete
operation to fail. Our tests did not catch this.
It is currently assumed that delete records are marked by "_hoodie_is_deleted";
however, custom CDC payloads use op field to mark deletes.
Impact:
OverwriteWithLatest payload(also OverwriteNonDefaultsWithLatestAvroPayload) are
not affected.
for any other custom payloads: (AWSDMSAvropayload, All debezium payloads)
deletes are broken.
If someone is using "_is_hoodie_deleted" to enforce deletes, there are no
issues w/ custome payloads.
COW:
deleting a non-existant will break if not using "_is_hoodie_deleted" way.
MOR:
any deletes will break if not using "_is_hoodie_deleted" way.
Writer:
all writers(spark, flink) except spark-sql.
DefaultHoodieRecordPayload delete marker support in 0.14.0 is also affected.
was:
Delete operation in custom payload after RFC-46: while looking into a 0.13.1
release [blocker|https://github.com/apache/hudi/pull/8573], I found that custom
payload implementation like AWS DMS payload and Debezium payload are not
properly migrated to the new APIs introduced by RFC-46, causing the delete
operation to fail. Our tests did not catch this.
It is currently assumed that delete records are marked by "_hoodie_is_deleted";
however, custom CDC payloads use op field to mark deletes.
Impact:
OverwriteWithLatest payload(also OverwriteNonDefaultsWithLatestAvroPayload)
no issues.
for any other custom payloads: (AWSDMSAvropayload, All debezium payloads, )
deletes are broken.
If someone is using "_is_hoodie_deleted" to enforce deletes, there are no
issues w/ custome payloads.
COW:
deleting a non-existant will break if not using "_is_hoodie_deleted" way.
MOR:
any deletes will break if not using "_is_hoodie_deleted" way.
Writer:
all writers(spark, flink) except spark-sql.
DefaultHoodieRecordPayload delete marker support in 0.14.0 is also affected.
> CDC payload with op field for deletes do not work
> -------------------------------------------------
>
> Key: HUDI-6199
> URL: https://issues.apache.org/jira/browse/HUDI-6199
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Ethan Guo
> Assignee: Ethan Guo
> Priority: Blocker
> Fix For: 0.13.1
>
>
> Delete operation in custom payload after RFC-46: while looking into a 0.13.1
> release [blocker|https://github.com/apache/hudi/pull/8573], I found that
> custom payload implementation like AWS DMS payload and Debezium payload are
> not properly migrated to the new APIs introduced by RFC-46, causing the
> delete operation to fail. Our tests did not catch this.
>
> It is currently assumed that delete records are marked by
> "_hoodie_is_deleted"; however, custom CDC payloads use op field to mark
> deletes.
>
> Impact:
> OverwriteWithLatest payload(also OverwriteNonDefaultsWithLatestAvroPayload)
> are not affected.
> for any other custom payloads: (AWSDMSAvropayload, All debezium payloads)
> deletes are broken.
> If someone is using "_is_hoodie_deleted" to enforce deletes, there are no
> issues w/ custome payloads.
> COW:
> deleting a non-existant will break if not using "_is_hoodie_deleted" way.
> MOR:
> any deletes will break if not using "_is_hoodie_deleted" way.
> Writer:
> all writers(spark, flink) except spark-sql.
> DefaultHoodieRecordPayload delete marker support in 0.14.0 is also affected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)