georgew5656 opened a new pull request, #14802: URL: https://github.com/apache/druid/pull/14802
The main motivation behind this PR is to fix a bug I noticed with large task payloads (>~1MB). Since we compress the task.json and then store it in an environment variable (TASK_JSON), we are limited by the max environment variable size of the (usually linux) OS we are running the peon on, which is normally 128 KB (https://askubuntu.com/questions/1385551/how-long-can-display-environment-variable-value-be#:~:text=As%20a%20result%2C%20the%20maximum,131%2C072%20single%2Dbyte%20ASCII%20characters.) This issue can be replicated by trying to create a task with a payload that gzip compresses to >128KB. You'll get a "Argument list too long" when starting the container. ### Description I think the best way to get around this problem is to introduce a flag into the internal peon command that lets you choose to have the peon pull the task payload from the overlord during startup, rather than expecting it to be provided via environment variable. The problem with this solution is that the KubernetesTaskRunner currently relies on a toTask(Job) interface on each of the taskAdapters to restore tasks from k8s. When the runner is restarted, it lists the running jobs in the cluster and then uses the TASK_JSON environment variable to reconstruct the task. However, I don't think we actually need to do this, because for a task to be restored properly it needs to also be available in TaskStorage. If a job is restored from K8s in the KubernetesTaskRunner, but it's not available in taskStorage, then you get errors like below. ``` java.lang.RuntimeException: org.apache.druid.java.util.common.ISE: Unable to grant lock to inactive Task [index_parallel_wikipedia_lngdjhog_2023-08-10T14:00:12.586Z] ``` And eventually TaskQueue sends a request to the KubernetesTaskRunner to shut the task down. ``` 2023-08-10T14:01:12,443 INFO [TaskQueue-Manager] org.apache.druid.k8s.overlord.KubernetesTaskRunner - Shutdown [index_parallel_wikipedia_lngdjhog_2023-08-10T14:00:12.586Z] because [Task is not in knownTaskIds] ``` This is reproducible by deploying the overlord with HeapMemoryTaskStorage (instead of storing tasks in a metadataDB) creating a long running task (like a kafka ingest or bigger batch ingest), then restarting the overlord. The KubernetesTaskRunner will restore the task but it will get killed by TaskQueue since the task will not be in TaskStorage. Because of this, I updated the restore() method to check each Job (by taskId) in TaskStorage before restoring it. This has the side effect of letting us deprecate toTask(Job) on the TaskAdapter, so we can add the feature to have the peons pull the task.json from the overlord instead of providing it via environment variable. Will have a second PR that actually adds the flag to get the peon to pull the task.json instead of passing it via TASK_JSON soon. #### Release note - Update KubernetesTaskRunner.restore() to check with TaskStorage before restoring a task. ##### Key changed/added classes in this PR * KubernetesTaskRunner * K8sTaskAdapter/PodTemplateTaskAdapter This PR has: - [X] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [ ] a release note entry in the PR description. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [X] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [X] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
