georgew5656 opened a new pull request, #16512:
URL: https://github.com/apache/druid/pull/16512
Currently it's possible to submit a really big task payload and have it OOM
the overlord if the request fails and logs the whole thing in
SQLMetadataStorageActionHandler.insertEntryWithHandle.
If the request happens to be larger than max_allowed_packet for a mysql
metadata store it will always fail. Since really large task payloads seem to
cause overlord instability in general I think it makes sense to limit the size
of task payloads at the task queue level.
### Description
Add a new config that sets a limit for task payload sizes. Throw a exception
if the limit is exceeded.
The default limit of 60 MB is based on the 64 MB default value of
max_allowed_packet in MySQL 8+.
I would ideally like to use the http request content-length header to
calculate the size of the task payload rather than re-serializing it in memory
but we also call taskQueue.add directly from supervisors so that would bypass
the check. If others think this is acceptable and it would be better to check
Content-Length in OverlordResource I am fine with changing this logic.
#### Release note
- Adding a new guardrail for submitting tasks that are too large
##### Key changed/added classes in this PR
* `TaskQueue`
This PR has:
- [X] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ X added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [X] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [X] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]