jasonk000 opened a new pull request #11272:
URL: https://github.com/apache/druid/pull/11272
This PR is easiest viewed in split! (imo).
Fixes #11140 (
the task view only).
### Description
See #11140.
This PR unlocks concurrency and improves performance of the
`HeapMemoryTaskStorage` task storage component.
The patch introduces two key changes:
1. Replace `giant.lock` with (a) a ConcurrentHashMap for the task list
queries and (b) synchronized blocks for remaining task audit log and task locks.
2. Simplify algorithm for getting tasks (as used by overlord /tasks API) so
that tasks are filtered _before_ they are sorted, and limited _before_ they are
materialized. As screenshots in #11140 show this results in big improvements on
our services.
### Design decisions
- I did note that both task locks and task audits are (mostly) distinct
APIs, sharing only the task ID and the task removal function. It is possible to
also make these lock-free, however in the interest of minimizing surface area
of change I opted not to do this.
- Part of the reason for this is that it is possible to create audit logs
and lock entries for tasks that do not exist in the tasks list. I am not sure
if this is intended so I opted to retain existing behaviour. Alternatively, if
this is not intended, and locks / logs can only exist for previously created
tasks, then I would bring the locks & logs into the ConcurrentHashMap which
would allow elimination of all locking.
- I did look at creating a ReaderWriterLock to replace the existing giant
lock, however in our internal reviews the code was actually less clear to read
than a simpler lock-free implementation.
- I opted for a slight reorder of `getLocks` and `removeTasksOlderThan`
since it keeps their related locks map codepaths closer in the file. I could
have left this but I find it easier to keep the same synchronized sections on
the same page.
### Checklist
This PR has:
- [x] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
- see below!
- [x] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [x] been tested in a test Druid cluster.
### Key questions
- Would appreciate input on testing scope
- what/ specific additional concurrency tests would be desired.
OverlordTest?
- It seems like from the `diff-test-coverage` report that I'd be well
served by adding tests to cover items such as whether the comparator works or
filter/map rules that were not previously covered. I can add these if desired.
Where do you think the most value is?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]