jihoonson opened a new pull request #10736:
URL: https://github.com/apache/druid/pull/10736


   ### Description
   
   `AbstractBatchIndexTask.tryTimeChunkLock()` is called `isReady()` which is 
called in the Overlord to check if the task is ready for execution. Since the 
Overlord only checks the result of `isReady()` but does nothing with locks by 
itself, locking individual interval can lead to deadlock. Imagine that there 
are two tasks that want to work on overlapping intervals. One task could lock 
some of intervals but failed for others for some reason. Later, another task 
could lock those intervals where the first task failed to lock. Now they would 
wait for each other to release locks they possess. 
   
   This PR changes to lock an umbrella interval of the input intervals instead 
of locking them individually. This is an easy fix, but I'm not sure if it's the 
best. Maybe the Overlord should take care of those tasks who are not ready yet 
and release any locks they acquired during `isReady()`. This approach will work 
better when the input intervals are not consecutive (the approach locking an 
umbrella interval will unnecessarily lock some intervals even though the task 
doesn't need them). We can get back to this approach when this turns out to be 
an issue.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to