bhasudha commented on code in PR #9709: URL: https://github.com/apache/hudi/pull/9709#discussion_r1327962056
########## website/docs/rollbacks.md: ########## @@ -0,0 +1,67 @@ +--- +title: Partially Failed Commits +toc: true +--- + +## Partially failed commits + +Your pipelines could fail due to numerous reasons like crashes, valid bugs in the code, unavailability of any external +third party system (like lock provider), or user could kill mid-way to change some properties. A well designed system should +detect such partially failed commits and ensure dirty data is not exposed to the read queries and also clean them up. +We have already took a peek into Hudi’s timeline which forms the core for reader and writer isolation. If a commit has +not transitioned to complete as per the hudi timeline, the readers will ignore the data from the respective write. +And so partially failed writes are never read by any readers (for all query types). But the curious question is, how +does the partially written data is eventually deleted? Does it require manual command to be executed from time to time +or should it be automatically handled by the system? + +### Handling partially failed commits +Hudi has a lot of platformization built in so as to ease the operationalization of lakehouse tables. Once such feature Review Comment: ```suggestion Hudi has a lot of platformization built in so as to ease the operationalization of lakehouse tables. One such feature ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
