[ 
https://issues.apache.org/jira/browse/HUDI-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611772#comment-17611772
 ] 

Ethan Guo commented on HUDI-4958:
---------------------------------

I check the commit data after insert, upsert (including deletes with 
"_hoodie_is_deleted"), and delete operations using Spark datasource in the 
Spark Guide.  The numInserts and numDeletes look accurate.  Also the logic for 
deriving numDeletes in HoodieMergeHandle looks OK.  We need to see if the 
inaccuracy comes from the custom payload implementation.

> Provide accurate numDeletes in commit metadata
> ----------------------------------------------
>
>                 Key: HUDI-4958
>                 URL: https://issues.apache.org/jira/browse/HUDI-4958
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Major
>
> When doing a simple computation of {{numInserts - numDeletes}} for all the 
> commits, this leads to negative total records.  Need to check if number of 
> inserts and deletes are accurate when both inserts and deletes exist in the 
> same input batch for upsert.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to