chenshzh opened a new pull request, #7697:
URL: https://github.com/apache/hudi/pull/7697

   ### Change Logs
   
   To fix flink WriteMetadataEvents lost when committing instant:  we add a 
wait and ack mechanism when StreamWriteOperatorCoordinator executes 
notifyCheckpointComplete.
   
   1. **Reasons why we need an ack mechanism:**
   
   In some extreme cases, when checkpoints in the writting functions are 
completed and sending back their meta events, 
   but due to network latency, the coordinator notifyCheckpointComplete might 
be invoked before handleEventFromOperator to handle metas.
   
   Thus, we will commit the instant with un-completed meta events by mistake.
   A wait and ack mechanism to verify that all last meta events(lastBatch = 
true)from each task are received, to rescue this commit
   
   2. **Reasons why we need a specific ack thread but not in the common single 
thread executor:**
   
   It'll be a DEAD lock between notifyCheckpointComplete verification and 
handleEventFromOperator in the single thread executor
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   1. `write.metadata.event.ack.timeout`:  
   
   default 10_000L, means 10 seconds
   
   Timeout limit for StreamWriteCoordinator notifyCheckpointComplete to wait in 
the ack thread until all meta events from tasks are received and handled
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to