[
https://issues.apache.org/jira/browse/HUDI-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-6084:
---------------------------------
Labels: pull-request-available (was: )
> Ensure write operations to MDT do not absorb failures
> -----------------------------------------------------
>
> Key: HUDI-6084
> URL: https://issues.apache.org/jira/browse/HUDI-6084
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Prashant Wason
> Assignee: Prashant Wason
> Priority: Major
> Labels: pull-request-available
>
> Issue 1:
> When we call compaction on MDT, we do not check the return value. Compaction
> operation may have had errors reported in the WriteStatus. This will cause
> missing data in MDT.
> MDT operations should never succeed in case of errors.
> Issue 2:
> Once a deltacommit has completed, the WriteStatus has been used to finalize
> the write and write the deltacommit action. The code was collecting the
> WriteStatus on the driver side to check for any errors that occurred during
> the writing. Since MDT write config has autoCommit, if there were any errors
> then there is no value of checking them at this stage since the deltacommit
> has already completed. Also, the write status RDD may have been unpersisted
> and if a cached value is not available then it will lead to re-writing of the
> deltacommit.
>
> Fix:
> MDT uses FailOnFirstErrorWriteStatus which is designed to throw an exception
> when the first write error is detected. Hence, we do not need to check for
> write errors explicitly. If any write errors would have occurred then the
> write itself would not have completed and thrown an exception.
> Also, we do not need to check the WriteStatus after commit has completed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)