[ 
https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf closed HUDI-802.
----------------------

> AWSDmsTransformer does not handle insert -> delete of a row in a single batch 
> correctly
> ---------------------------------------------------------------------------------------
>
>                 Key: HUDI-802
>                 URL: https://issues.apache.org/jira/browse/HUDI-802
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: DeltaStreamer
>            Reporter: Christopher Weaver
>            Assignee: Balaji Varadarajan
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.6.1
>
>
> The provided AWSDmsAvroPayload class 
> ([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java])
>  currently handles cases where the "Op" column is a "D" for updates, and 
> successfully removes the row from the resulting table. 
> However, when an insert is quickly followed by a delete on the row (e.g. DMS 
> processes them together and puts the update records together in the same 
> parquet file), the row incorrectly appears in the resulting table. In this 
> case, the record is not in the table and getInsertValue is called rather than 
> combineAndGetUpdateValue. Since the logic to check for a delete is in 
> combineAndGetUpdateValue, it is skipped and the delete is missed. Something 
> like this could fix this issue: 
> [https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java].
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to