georgew5656 opened a new pull request, #14887:
URL: https://github.com/apache/druid/pull/14887

   This change is meant to fix a issue where passing too large of a task 
payload to the mm-less task runner will cause the peon to fail to startup 
because the payload is passed (compressed) as a environment variable 
(TASK_JSON). In linux systems the limit for a environment variable is commonly 
128KB, for windows systems less than this. Setting a env variable longer than 
this results in a bunch of "Argument list too long" errors.
   
   ### Description
   The goal of this patch is to prevent larger tasks from failing with mm-less 
ingestion.
   
   To address the immediate problem (setting environment variables that are too 
large), I added a additional config for the KubernetesTaskRunner 
(druid.indexer.runner.taskPayloadAsEnvVariable) that defaults to true but can 
be optionally set to false. Setting this config to false will cause the K8s 
adapters to not set the task payload as the TASK_JSON env variable. This 
prevents the Jobs from failing to come up.
   
   We still need to pass the task.json payload to the peons somehow. I explored 
three options for this, and ended up going with option 3.
   
   1. Using configmaps to store the task payload and then mounting them onto 
the created peon pods. 
   I decided not to go with this option because configmaps still have a 1MB 
size limit and I was concerned with the KubernetesTaskRunner having to manage a 
bunch of configmaps in addition to jobs. Having this many configmaps also 
pollutes K8s metadata, making it hard to see anything else going on.
   
   2. Updating CliPeon to use the getTaskPayload endpoint on the overlord to 
pull the task.json payload on startup. This didn't work because we currently 
have a guice injector in the peon that requires the task.json be available at 
injection time. In order to pull the task.json from the overlord, we need to 
use the ServiceLocator class which is only available once the peon lifecycle 
has already started (after injection). Changing this would have required many 
changes to the code so I didn't want to do it. Additionally, I would have had 
to deprecate the toTask interface on the overlords since there would be no way 
for the overlord to turn a K8s Job into a task definition.
   
   3. Push the task payload into task logs deep storage and have the peon read 
the payload.
   I ended up going with this option because it was the most simple to 
implement and the most future-proof (no worry about task payloads getting 
larger than 1MB). The task logs killer will automatically delete the task.json 
in deep storage alongside he task logs whenever it is run.
   
   To implement option 3 I included several changes
   - Introduced a new interface (TaskPayloadManager) that exposes two methods 
(push task payload, pull task payload) and is implemented by TaskLogs. (I only 
implemented it for S3TaskLogs, but it should be easy to implement for other 
deep storage systems).
   - Introduce a new config in TaskConfig 
(druid.indexer.task.enableTaskPayloadManagerPerTask). When set on the overlord, 
mm-less ingestion will push the task.json payload to deep storage before 
launching a k8s job. When set on the peon, the CliPeon task injector will read 
the task.json from deep storage instead of assuming it is available on the file 
system. This is currently only useable by mm-less ingestion but it technically 
could be used in any task running scenario.
   - The K8s adapters in mm-less ingestion will check deep storage if the 
config is set when converting K8s jobs to tasks in toTask
   
   #### Release note
   - Support compressed task payloads larger than 128KB in mm-less ingestion.
   
   ##### Key changed/added classes in this PR
    * `CliPeon`
    * `KubernetesPeonLifecycle `
    * `PodTemplateTaskAdapter`
    * `K8sTaskAdapter`
    * `S3TaskLogs` `TaskLogs`
   
   
   
   
   I can add some more documentation to this PR later but I wanted to get some 
feedback on this approach before doing so.
   
   This PR has:
   
   - [X] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [X] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [X] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to