Balajee Nagasubramaniam created HUDI-6416:
---------------------------------------------
Summary: Completion Markers for handling spark retries
Key: HUDI-6416
URL: https://issues.apache.org/jira/browse/HUDI-6416
Project: Apache Hudi
Issue Type: Bug
Reporter: Balajee Nagasubramaniam
During spark stage retries, spark driver may have all the information to
reconcile the commit and proceed with next steps, while a stray executor may
still be writing to a data file and complete later (before the JVM exit).
Extra files left on the dataset, excluded from reconcile commit step could show
up as data quality issue for query engines with duplicate records.
This change brings completion markers which tries to prevent the dataset from
experiencing data quality issues, in such corner case scenarios.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)