[
https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bhavani Sudha resolved HUDI-802.
--------------------------------
Resolution: Fixed
> AWSDmsTransformer does not handle insert -> delete of a row in a single batch
> correctly
> ---------------------------------------------------------------------------------------
>
> Key: HUDI-802
> URL: https://issues.apache.org/jira/browse/HUDI-802
> Project: Apache Hudi
> Issue Type: Bug
> Components: DeltaStreamer
> Reporter: Christopher Weaver
> Assignee: sivabalan narayanan
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.6.0
>
>
> The provided AWSDmsAvroPayload class
> ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java])
> currently handles cases where the "Op" column is a "D" for updates, and
> successfully removes the row from the resulting table.
> However, when an insert is quickly followed by a delete on the row (e.g. DMS
> processes them together and puts the update records together in the same
> parquet file), the row incorrectly appears in the resulting table. In this
> case, the record is not in the table and getInsertValue is called rather than
> combineAndGetUpdateValue. Since the logic to check for a delete is in
> combineAndGetUpdateValue, it is skipped and the delete is missed. Something
> like this could fix this issue:
> [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java].
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)