hudi-bot opened a new issue, #16045: URL: https://github.com/apache/hudi/issues/16045
During spark stage retries, spark driver may have all the information to reconcile the commit and proceed with next steps, while a stray executor may still be writing to a data file and complete later (before the JVM exit). Extra files left on the dataset, excluded from reconcile commit step could show up as data quality issue for query engines with duplicate records. This change brings completion markers which tries to prevent the dataset from experiencing data quality issues, in such corner case scenarios. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-6416 - Type: Bug - Epic: https://issues.apache.org/jira/browse/HUDI-7967 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
