ayman-albaz opened a new issue, #28372:
URL: https://github.com/apache/airflow/issues/28372
### Apache Airflow version
2.5.0
### What happened
I have a dynamic mapping task that is supposed to launch over 100
KubernetesPodOperator tasks. I have assigned 2.0 CPUs per task. When running
the DAG, 16 tasks are in 'running state', however only 3 truly run, the
remainder 13 fail with `Pod took longer than 120 seconds to start`. The
remainder of the tasks are either queued or scheduled, and when there are less
than 16 active tasks, they run and more or less fail with the same error.
Here is a snapshot of
```
kubectl -n airflow get all
NAME READY
STATUS RESTARTS AGE
pod/airflow-postgresql-0 1/1
Running 5 2d
pod/airflow-scheduler-6dd68b485c-w8bhp 3/3
Running 19 2d
pod/airflow-statsd-586dbdcc6b-h4mnr 1/1
Running 5 2d
pod/airflow-triggerer-95565b95d-phts7 2/2
Running 14 2d
pod/airflow-webserver-599bb95bcd-7dtpk 1/1
Running 5 2d
pod/my-task-17dd038ca4d04164ba90f9c7f9a7fbb6 0/2 Pending
0 49s
pod/my-task-20aba86c65544ea384343f8fb4415d3a 0/2 Pending
0 53s
pod/my-task-3c5b4444a7d242459907ff3be7b7d6f6 0/2 Pending
0 44s
pod/my-task-5c8af5edb0904711b6a76a2edf1d1067 0/2 Pending
0 60s
pod/my-task-6001d3567f96400bb0ae559f22d3d2db 0/2 Pending
0 43s
pod/my-task-6dfb1945f3ff4ac4a06c7e6c6a85099c 0/2 Pending
0 81s
pod/my-task-71ad2fb48fb64f449014bba45bee980f 0/2
ContainerCreating 0 52s
pod/my-task-774216cb5f9344ffb35deac826d71639 0/2 Pending
0 68s
pod/my-task-814266d425254130868c3a5ebc8dce49 0/2 Pending
0 67s
pod/my-task-a11588d878b54944b4c069f49231ac36 0/2 Pending
0 77s
pod/my-task-b16c843fa038441ea31b90363ed86aa0 0/2 Pending
0 49s
pod/my-task-b85e2ed3417a4a62940661f418c900e5 0/2 Pending
0 60s
pod/my-task-d1de2a771a104a2592956a713f785300 0/2 Pending
0 73s
pod/my-task-dbeba55a80074c08bbdf023b3f0b885c 0/2 Completed
0 10m
pod/my-task-f83ad2805d314be3a7307b7216a54e53 2/2 Running
0 10m
pod/pipeline-my-task-0bc9e094afee4527b5b764e32f590282 0/1 Init:0/1
0 1s
pod/pipeline-my-task-1d51c5d3776e4dd8a89461e8a76faba1 1/1 Running
0 62s
pod/pipeline-my-task-24b1326a71d149fb9f62c101647468ee 1/1 Running
0 62s
pod/pipeline-my-task-29b132b7b0ce4832a5e30a821c6405bf 1/1 Running
0 10m
pod/pipeline-my-task-29fb55604eec457fa21d13d85c7889b5 1/1 Running
0 10m
pod/pipeline-my-task-2a337f1cc28b4315945cec8a961b1111 1/1 Running
0 69s
pod/pipeline-my-task-35d5c97570474082bc9b04189c433be7 1/1 Running
0 57s
pod/pipeline-my-task-569de133975d4dbb96becb2a04c0dac3 1/1 Running
0 78s
pod/pipeline-my-task-96a9681ace4441deba4faeef602f6e5b 1/1 Running
0 78s
pod/pipeline-my-task-9dcb9578720643eca5fa918a0a295f87 1/1 Running
0 87s
pod/pipeline-my-task-a643741d29ea4f4baa06e0ea20bc1a57 1/1 Running
0 10m
pod/pipeline-my-task-b04532a9f35a48a09cb1d46c9d9470dd 1/1 Running
0 57s
pod/pipeline-my-task-c9b7bb4ee07749be98083a11a512e1f4 1/1 Running
0 90s
pod/pipeline-my-task-d9c5ce9bf5ce499583cdf0ea3f58b7f0 1/1 Running
0 82s
pod/pipeline-my-task-dd5a5a45374f487fbc34c904e71b93b5 1/1 Running
0 59s
pod/pipeline-my-task-ea8c39d657824a1db505b00e8673b06a 1/1 Running
0 69s
pod/pipeline-my-task-fb5c71d274034f5392aebe0f4b395d98 1/1 Running
0 65s
```
### What you think should happen instead
Only 3 tasks should be running.
The remainder tasks should be scheduled or queued.
### How to reproduce
```python
import json
import textwrap
import pendulum
from airflow.decorators import dag, task
from airflow.models.param import Param
from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import (
KubernetesPodOperator,
Secret,
)
from kubernetes.client import models as k8s
@dag(
schedule=None,
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
catchup=False,
tags=["example"],
)
def pipeline():
container_resources = k8s.V1ResourceRequirements(
limits={
"memory": "512Mi",
"cpu": 2.0,
},
requests={
"memory": "512Mi",
"cpu": 2.0,
},
)
volumes = [
k8s.V1Volume(
name="pvc-airflow",
persistent_volume_claim=k8s.V1PersistentVolumeClaimVolumeSource(
claim_name="pvc-airflow"
),
)
]
volume_mounts = [
k8s.V1VolumeMount(mount_path="/airflow", name="pvc-airflow",
sub_path=None)
]
@task
def make_list():
return [{"a": "a"}] * 100
my_task = KubernetesPodOperator.partial(
name="my_task",
task_id="my_task",
image="ubuntu:20.04",
namespace="airflow",
container_resources=container_resources,
volumes=volumes,
volume_mounts=volume_mounts,
in_cluster=True,
do_xcom_push=True,
get_logs=True,
cmds=[
"/bin/bash",
"-c",
"""
sleep 600
"""
],
).expand(env_vars=make_list())
```
### Operating System
Ubuntu 20.04.5 LTS
### Versions of Apache Airflow Providers
_No response_
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
I am running this locally using the helm chart on Kind.
My machine is 4 CPU (x2), with 16 GB RAM.
### Anything else
I have confirmed that the failing tasks are not starting due to timeouts
from waiting for resources too long.
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]