ayman-albaz opened a new issue, #28372:
URL: https://github.com/apache/airflow/issues/28372

   ### Apache Airflow version
   
   2.5.0
   
   ### What happened
   
   I have a dynamic mapping task that is supposed to launch over 100 
KubernetesPodOperator tasks. I have assigned 2.0 CPUs per task. When running 
the DAG, 16 tasks are in 'running state', however only 3 truly run, the 
remainder 13 fail with `Pod took longer than 120 seconds to start`. The 
remainder of the tasks are either queued or scheduled, and when there are less 
than 16 active tasks, they run and more or less fail with the same error.
   
   Here is a snapshot of 
   ```
   kubectl -n airflow get all
   NAME                                                            READY   
STATUS              RESTARTS   AGE
   pod/airflow-postgresql-0                                        1/1     
Running             5          2d
   pod/airflow-scheduler-6dd68b485c-w8bhp                          3/3     
Running             19         2d
   pod/airflow-statsd-586dbdcc6b-h4mnr                             1/1     
Running             5          2d
   pod/airflow-triggerer-95565b95d-phts7                           2/2     
Running             14         2d
   pod/airflow-webserver-599bb95bcd-7dtpk                          1/1     
Running             5          2d
   pod/my-task-17dd038ca4d04164ba90f9c7f9a7fbb6            0/2     Pending      
       0          49s
   pod/my-task-20aba86c65544ea384343f8fb4415d3a            0/2     Pending      
       0          53s
   pod/my-task-3c5b4444a7d242459907ff3be7b7d6f6            0/2     Pending      
       0          44s
   pod/my-task-5c8af5edb0904711b6a76a2edf1d1067            0/2     Pending      
       0          60s
   pod/my-task-6001d3567f96400bb0ae559f22d3d2db            0/2     Pending      
       0          43s
   pod/my-task-6dfb1945f3ff4ac4a06c7e6c6a85099c            0/2     Pending      
       0          81s
   pod/my-task-71ad2fb48fb64f449014bba45bee980f            0/2     
ContainerCreating   0          52s
   pod/my-task-774216cb5f9344ffb35deac826d71639            0/2     Pending      
       0          68s
   pod/my-task-814266d425254130868c3a5ebc8dce49            0/2     Pending      
       0          67s
   pod/my-task-a11588d878b54944b4c069f49231ac36            0/2     Pending      
       0          77s
   pod/my-task-b16c843fa038441ea31b90363ed86aa0            0/2     Pending      
       0          49s
   pod/my-task-b85e2ed3417a4a62940661f418c900e5            0/2     Pending      
       0          60s
   pod/my-task-d1de2a771a104a2592956a713f785300            0/2     Pending      
       0          73s
   pod/my-task-dbeba55a80074c08bbdf023b3f0b885c            0/2     Completed    
       0          10m
   pod/my-task-f83ad2805d314be3a7307b7216a54e53            2/2     Running      
       0          10m
   pod/pipeline-my-task-0bc9e094afee4527b5b764e32f590282   0/1     Init:0/1     
       0          1s
   pod/pipeline-my-task-1d51c5d3776e4dd8a89461e8a76faba1   1/1     Running      
       0          62s
   pod/pipeline-my-task-24b1326a71d149fb9f62c101647468ee   1/1     Running      
       0          62s
   pod/pipeline-my-task-29b132b7b0ce4832a5e30a821c6405bf   1/1     Running      
       0          10m
   pod/pipeline-my-task-29fb55604eec457fa21d13d85c7889b5   1/1     Running      
       0          10m
   pod/pipeline-my-task-2a337f1cc28b4315945cec8a961b1111   1/1     Running      
       0          69s
   pod/pipeline-my-task-35d5c97570474082bc9b04189c433be7   1/1     Running      
       0          57s
   pod/pipeline-my-task-569de133975d4dbb96becb2a04c0dac3   1/1     Running      
       0          78s
   pod/pipeline-my-task-96a9681ace4441deba4faeef602f6e5b   1/1     Running      
       0          78s
   pod/pipeline-my-task-9dcb9578720643eca5fa918a0a295f87   1/1     Running      
       0          87s
   pod/pipeline-my-task-a643741d29ea4f4baa06e0ea20bc1a57   1/1     Running      
       0          10m
   pod/pipeline-my-task-b04532a9f35a48a09cb1d46c9d9470dd   1/1     Running      
       0          57s
   pod/pipeline-my-task-c9b7bb4ee07749be98083a11a512e1f4   1/1     Running      
       0          90s
   pod/pipeline-my-task-d9c5ce9bf5ce499583cdf0ea3f58b7f0   1/1     Running      
       0          82s
   pod/pipeline-my-task-dd5a5a45374f487fbc34c904e71b93b5   1/1     Running      
       0          59s
   pod/pipeline-my-task-ea8c39d657824a1db505b00e8673b06a   1/1     Running      
       0          69s
   pod/pipeline-my-task-fb5c71d274034f5392aebe0f4b395d98   1/1     Running      
       0          65s
   ```
   
   ### What you think should happen instead
   
   Only 3 tasks should be running.
   The remainder tasks should be scheduled or queued.
   
   ### How to reproduce
   
   ```python
   
   import json
   import textwrap
   
   import pendulum
   
   from airflow.decorators import dag, task
   from airflow.models.param import Param
   from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import (
       KubernetesPodOperator,
       Secret,
   )
   from kubernetes.client import models as k8s
   
   
   @dag(
       schedule=None,
       start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
       catchup=False,
       tags=["example"],
   )
   def pipeline():
   
       container_resources = k8s.V1ResourceRequirements(
           limits={
               "memory": "512Mi",
               "cpu": 2.0,
           },
           requests={
               "memory": "512Mi",
               "cpu": 2.0,
           },
       )
   
   
       volumes = [
           k8s.V1Volume(
               name="pvc-airflow",
               persistent_volume_claim=k8s.V1PersistentVolumeClaimVolumeSource(
                   claim_name="pvc-airflow"
               ),
           )
       ]
   
       volume_mounts = [
           k8s.V1VolumeMount(mount_path="/airflow", name="pvc-airflow", 
sub_path=None)
       ]
   
       @task
       def make_list():
           return [{"a": "a"}] * 100
   
       my_task = KubernetesPodOperator.partial(
           name="my_task",
           task_id="my_task",
           image="ubuntu:20.04",
           namespace="airflow",
           container_resources=container_resources,
           volumes=volumes,
           volume_mounts=volume_mounts,
           in_cluster=True,
           do_xcom_push=True,
           get_logs=True,
           cmds=[
               "/bin/bash",
               "-c",
               """
                   sleep 600
               """
           ],
       ).expand(env_vars=make_list())
   ```
   
   ### Operating System
   
   Ubuntu 20.04.5 LTS
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   I am running this locally using the helm chart on Kind.
   
   My machine is 4 CPU (x2), with 16 GB RAM.
   
   ### Anything else
   
   I have confirmed that the failing tasks are not starting due to timeouts 
from waiting for resources too long.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to