Balajee Nagasubramaniam created HUDI-6416:
---------------------------------------------

             Summary: Completion Markers for handling spark retries
                 Key: HUDI-6416
                 URL: https://issues.apache.org/jira/browse/HUDI-6416
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Balajee Nagasubramaniam


During spark stage retries, spark driver may have all the information to 
reconcile the commit and proceed with next steps, while a stray executor may 
still be writing to a data file and complete later (before the JVM exit). 

Extra files left on the dataset, excluded from reconcile commit step could show 
up as data quality issue for query engines with duplicate records.

This change brings completion markers which tries to prevent the dataset from 
experiencing data quality issues,  in such corner case scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to