Christopher Weaver created HUDI-802:
---------------------------------------

             Summary: AWSDmsTransformer does not handle insert -> delete of a 
row in a single batch correctly
                 Key: HUDI-802
                 URL: https://issues.apache.org/jira/browse/HUDI-802
             Project: Apache Hudi (incubating)
          Issue Type: Bug
          Components: DeltaStreamer
            Reporter: Christopher Weaver


The provided AWSDmsAvroPayload class 
([https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/payload/AWSDmsAvroPayload.java])
 currently handles cases where the "Op" column is a "D" for updates, and 
successfully removes the row from the resulting table. 

However, when an insert is quickly followed by a delete on the row (e.g. DMS 
processes them together and puts the update records together in the same 
parquet file), the row incorrectly appears in the resulting table. In this 
case, the record is not in the table and getInsertValue is called rather than 
combineAndGetUpdateValue. Since the logic to check for a delete is in 
combineAndGetUpdateValue, it is skipped and the delete is missed. Something 
like this could fix this issue: 
[https://github.com/Weves/incubator-hudi/blob/release-0.5.1/hudi-spark/src/main/java/org/apache/hudi/payload/CustomAWSDmsAvroPayload.java].
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to