nbalajee opened a new pull request, #9035:
URL: https://github.com/apache/hudi/pull/9035

   …retries
   
   ### Change Logs
   During spark stage retries, spark driver may have all the information to 
reconcile the commit and proceed with next steps, while a stray executor may 
still be writing to a data file and complete later (beyond reconcile step, 
before the JVM exit).
   
   Extra files left on the dataset, excluded from reconcile commit step could 
show up as data quality issue for query engines with duplicate records.
   
   This change brings completion markers which tries to prevent the dataset 
from experiencing data quality issues, in such corner case scenarios.
   
   A planned future change, would prevent the second/subsequent tasks/executors 
from creating additional files (with a different write token) and reuse the 
successfully completed files.
   
   ### Impact
   Improved reliability, data quality due to infrastructure related failures, 
resulting in stage/task retries.
   
   ### Risk level (write none, low medium or high below)
   Low/Medium:  This change has been in production for about a year now at Uber.
   
   ### Documentation Update
   ENFORCE_COMPLETION_MARKER_CHECKS - Allows configuring whether to fail the 
job or continue with retries, when an already completed file is being retried.  
 With a planned change, this would allow the second/subsequent attempt to 
create a file to succeed using the previously created copy of data.
   
   ENFORCE_FINALIZE_WRITE_CHECK - Allows configuring whether to fail the job if 
commit reconciliation step has been completed and the write stage is retried 
(say a block from writeStatus RDD is found to be lost, when iterating over 
write statues for record index update).   Single executor writing to multiple 
files (data spilling over to more than one file) and stage failure post 
reconciliation results in data quality issues.  This flag is helpful in failing 
the job, instead of creating incorrect commit.
   
   ### Contributor's checklist
   
   - [x ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [x] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to