[ 
https://issues.apache.org/jira/browse/HUDI-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-6084:
---------------------------------
    Labels: pull-request-available  (was: )

> Ensure write operations to MDT do not absorb failures
> -----------------------------------------------------
>
>                 Key: HUDI-6084
>                 URL: https://issues.apache.org/jira/browse/HUDI-6084
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Prashant Wason
>            Assignee: Prashant Wason
>            Priority: Major
>              Labels: pull-request-available
>
> Issue 1:
> When we call compaction on MDT, we do not check the return value. Compaction 
> operation may have had errors reported in the WriteStatus. This will cause 
> missing data in MDT.
> MDT operations should never succeed in case of errors. 
> Issue 2:
> Once a deltacommit has completed, the WriteStatus has been used to finalize 
> the write and write the deltacommit action. The code was collecting the 
> WriteStatus on the driver side to check for any errors that occurred during 
> the writing. Since MDT write config has autoCommit, if there were any errors 
> then there is no value of checking them at this stage since the deltacommit 
> has already completed. Also, the write status RDD may have been unpersisted 
> and if a cached value is not available then it will lead to re-writing of the 
> deltacommit.
>  
> Fix:
> MDT uses FailOnFirstErrorWriteStatus which is designed to throw an exception 
> when the first write error is detected. Hence, we do not need to check for 
> write errors explicitly. If any write errors would have occurred then the 
> write itself would not have completed and thrown an exception.
> Also, we do not need to check the WriteStatus after commit has completed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to