georgew5656 opened a new pull request, #14887:
URL: https://github.com/apache/druid/pull/14887
This change is meant to fix a issue where passing too large of a task
payload to the mm-less task runner will cause the peon to fail to startup
because the payload is passed (compressed) as a environment variable
(TASK_JSON). In linux systems the limit for a environment variable is commonly
128KB, for windows systems less than this. Setting a env variable longer than
this results in a bunch of "Argument list too long" errors.
### Description
The goal of this patch is to prevent larger tasks from failing with mm-less
ingestion.
To address the immediate problem (setting environment variables that are too
large), I added a additional config for the KubernetesTaskRunner
(druid.indexer.runner.taskPayloadAsEnvVariable) that defaults to true but can
be optionally set to false. Setting this config to false will cause the K8s
adapters to not set the task payload as the TASK_JSON env variable. This
prevents the Jobs from failing to come up.
We still need to pass the task.json payload to the peons somehow. I explored
three options for this, and ended up going with option 3.
1. Using configmaps to store the task payload and then mounting them onto
the created peon pods.
I decided not to go with this option because configmaps still have a 1MB
size limit and I was concerned with the KubernetesTaskRunner having to manage a
bunch of configmaps in addition to jobs. Having this many configmaps also
pollutes K8s metadata, making it hard to see anything else going on.
2. Updating CliPeon to use the getTaskPayload endpoint on the overlord to
pull the task.json payload on startup. This didn't work because we currently
have a guice injector in the peon that requires the task.json be available at
injection time. In order to pull the task.json from the overlord, we need to
use the ServiceLocator class which is only available once the peon lifecycle
has already started (after injection). Changing this would have required many
changes to the code so I didn't want to do it. Additionally, I would have had
to deprecate the toTask interface on the overlords since there would be no way
for the overlord to turn a K8s Job into a task definition.
3. Push the task payload into task logs deep storage and have the peon read
the payload.
I ended up going with this option because it was the most simple to
implement and the most future-proof (no worry about task payloads getting
larger than 1MB). The task logs killer will automatically delete the task.json
in deep storage alongside he task logs whenever it is run.
To implement option 3 I included several changes
- Introduced a new interface (TaskPayloadManager) that exposes two methods
(push task payload, pull task payload) and is implemented by TaskLogs. (I only
implemented it for S3TaskLogs, but it should be easy to implement for other
deep storage systems).
- Introduce a new config in TaskConfig
(druid.indexer.task.enableTaskPayloadManagerPerTask). When set on the overlord,
mm-less ingestion will push the task.json payload to deep storage before
launching a k8s job. When set on the peon, the CliPeon task injector will read
the task.json from deep storage instead of assuming it is available on the file
system. This is currently only useable by mm-less ingestion but it technically
could be used in any task running scenario.
- The K8s adapters in mm-less ingestion will check deep storage if the
config is set when converting K8s jobs to tasks in toTask
#### Release note
- Support compressed task payloads larger than 128KB in mm-less ingestion.
##### Key changed/added classes in this PR
* `CliPeon`
* `KubernetesPeonLifecycle `
* `PodTemplateTaskAdapter`
* `K8sTaskAdapter`
* `S3TaskLogs` `TaskLogs`
I can add some more documentation to this PR later but I wanted to get some
feedback on this approach before doing so.
This PR has:
- [X] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [X] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [X] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [X] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]