yihua opened a new pull request, #6536:
URL: https://github.com/apache/hudi/pull/6536

   ### Change Logs
   
   For Hudi Deltastreamer with async cleaning, when the Spark job fails in the 
middle of the cleaning, leaving a clean instant inflight in the timeline, the 
Spark job retried next time may not resume the inflight clean action if 
`hoodie.clean.allow.multiple` is `false`, i.e., multiple clean schedules are 
disabled.  This is due to a bug of the conditional check which is used for both 
clean service scheduling and execution.
   
   The fix is to let clean service execution proceed regardless of whether the 
scheduling a new clean action happens, so that inflight clean action can 
proceed.
   
   ### Impact
   
   **Risk level: medium**
   Tested locally by setting `hoodie.clean.allow.multiple` to `false` 
andinjecting errors into the clean execution in Deltastreamer, to leave an 
inflight clean instant in the timeline.  The clean table service is able to 
make progress after retrying with this fix.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to