georgew5656 opened a new pull request, #14802:
URL: https://github.com/apache/druid/pull/14802

   The main motivation behind this PR is to fix a bug I noticed with large task 
payloads (>~1MB). Since we compress the task.json and then store it in an 
environment variable (TASK_JSON), we are limited by the max environment 
variable size of the (usually linux) OS we are running the peon on, which is 
normally 128 KB 
(https://askubuntu.com/questions/1385551/how-long-can-display-environment-variable-value-be#:~:text=As%20a%20result%2C%20the%20maximum,131%2C072%20single%2Dbyte%20ASCII%20characters.)
   
   This issue can be replicated by trying to create a task with a payload that 
gzip compresses to >128KB. You'll get a "Argument list too long" when starting 
the container.
   
   ### Description
   I think the best way to get around this problem is to introduce a flag into 
the internal peon command that lets you choose to have the peon pull the task 
payload from the overlord during startup, rather than expecting it to be 
provided via environment variable.
   
   The problem with this solution is that the KubernetesTaskRunner currently 
relies on a toTask(Job) interface on each of the taskAdapters to restore tasks 
from k8s. When the runner is restarted, it lists the running jobs in the 
cluster and then uses the TASK_JSON environment variable to reconstruct the 
task.
   
   However, I don't think we actually need to do this, because for a task to be 
restored properly it needs to also be available in TaskStorage. If a job is 
restored from K8s in the KubernetesTaskRunner, but it's not available in 
taskStorage, then you get errors like below.
   ```
   java.lang.RuntimeException: org.apache.druid.java.util.common.ISE: Unable to 
grant lock to inactive Task 
[index_parallel_wikipedia_lngdjhog_2023-08-10T14:00:12.586Z]
   ```
   
   And eventually TaskQueue sends a request to the KubernetesTaskRunner to shut 
the task down.
   ```
   2023-08-10T14:01:12,443 INFO [TaskQueue-Manager] 
org.apache.druid.k8s.overlord.KubernetesTaskRunner - Shutdown 
[index_parallel_wikipedia_lngdjhog_2023-08-10T14:00:12.586Z] because [Task is 
not in knownTaskIds]
   ```
   This is reproducible by deploying the overlord with HeapMemoryTaskStorage 
(instead of storing tasks in a metadataDB) creating a long running task (like a 
kafka ingest or bigger batch ingest), then restarting the overlord. The 
KubernetesTaskRunner will restore the task but it will get killed by TaskQueue 
since the task will not be in TaskStorage.
   
   Because of this, I updated the restore() method to check each Job (by 
taskId) in TaskStorage before restoring it. This has the side effect of letting 
us deprecate toTask(Job) on the TaskAdapter, so we can add the feature to have 
the peons pull the task.json from the overlord instead of providing it via 
environment variable.
   
   Will have a second PR that actually adds the flag to get the peon to pull 
the task.json instead of passing it via TASK_JSON soon.
   #### Release note
   - Update KubernetesTaskRunner.restore() to check with TaskStorage before 
restoring a task.
   
   ##### Key changed/added classes in this PR
    * KubernetesTaskRunner
    * K8sTaskAdapter/PodTemplateTaskAdapter
   
   This PR has:
   
   - [X] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [X] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to