C0urante opened a new pull request, #16784:
URL: https://github.com/apache/kafka/pull/16784

   There's a small chance for a race condition that can record an incorrect 
stage for callbacks in `DistributedHerder::addRequest`.
   
   The order of operations on trunk is:
   1. Add the request to the queue
   2. If necessary, wake up the tick thread from polling the group coordinator
   3. Record the current tick thread stage on the request callback
   
   This is valid in most cases, but when a separate thread takes "ownership" of 
the request callback and begins recording stages on it, then the stage that's 
recorded in step 3 above may incorrectly overwrite stages reported on that 
separate thread if the request has already started running at that point.
   
   This is the case with submitting new connector configs to the config topic 
where we first validate the connector config on a separate thread.
   
   This may explain some of the flaky failures we've seen for the 
`BlockingConnectorTest`, like the recent one 
[here](https://ge.apache.org/s/omkkps6tf2e2w/tests/task/:connect:runtime:test/details/org.apache.kafka.connect.integration.BlockingConnectorTest/testBlockInConnectorConfig()?top-execution=1)
 on trunk.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to