jasonk000 opened a new pull request #11272:
URL: https://github.com/apache/druid/pull/11272


   This PR is easiest viewed in split! (imo).
   
   Fixes #11140 (
   the task view only).
   
   ### Description
   
   See #11140.
   
   This PR unlocks concurrency and improves performance of the 
`HeapMemoryTaskStorage` task storage component.
   
   The patch introduces two key changes:
   1. Replace `giant.lock` with (a) a ConcurrentHashMap for the task list 
queries and (b) synchronized blocks for remaining task audit log and task locks.
   2. Simplify algorithm for getting tasks (as used by overlord /tasks API) so 
that tasks are filtered _before_ they are sorted, and limited _before_ they are 
materialized. As screenshots in #11140 show this results in big improvements on 
our services.
   
   ### Design decisions
   
   - I did note that both task locks and task audits are (mostly) distinct 
APIs, sharing only the task ID and the task removal function. It is possible to 
also make these lock-free, however in the interest of minimizing surface area 
of change I opted not to do this.
   - Part of the reason for this is that it is possible to create audit logs 
and lock entries for tasks that do not exist in the tasks list. I am not sure 
if this is intended so I opted to retain existing behaviour. Alternatively, if 
this is not intended, and locks / logs can only exist for previously created 
tasks, then I would bring the locks & logs into the ConcurrentHashMap which 
would allow elimination of all locking.
   - I did look at creating a ReaderWriterLock to replace the existing giant 
lock, however in our internal reviews the code was actually less clear to read 
than a simpler lock-free implementation.
   
   - I opted for a slight reorder of `getLocks` and `removeTasksOlderThan` 
since it keeps their related locks map codepaths closer in the file. I could 
have left this but I find it easier to keep the same synchronized sections on 
the same page.
   
   ### Checklist
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
         - see below!
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [x] been tested in a test Druid cluster.
   
   
   ### Key questions
   
   - Would appreciate input on testing scope
       - what/ specific additional concurrency tests would be desired. 
OverlordTest?
       - It seems like from the `diff-test-coverage` report that I'd be well 
served by adding tests to cover items such as whether the comparator works or 
filter/map rules that were not previously covered. I can add these if desired. 
Where do you think the most value is?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to