TejasMorbagal opened a new issue, #54550:
URL: https://github.com/apache/airflow/issues/54550
### Official Helm Chart version
1.18.0 (latest released)
### Apache Airflow version
3.0.4, 3.0.2
### Kubernetes Version
1.31
### Helm Chart configuration
airflow:
defaultAirflowTag: "3.0.4-python3.12"
enabled: true
airflowVersion: "3.0.4"
fullnameOverride: "airflow-production"
executor: KubernetesExecutor
allowPodLaunching: true
apiServer:
defaultUser:
enabled: false
replicas: 1
serviceAccount:
create: false
name: airflow-sa
# annotations:
# eks.amazonaws.com/role-arn: arn:aws:iam::####:policy/s3_role
redis:
enabled: false
postgresql:
enabled: false
env:
- name: PYTHONPATH
value: "/opt/airflow/dags:$PYTHONPATH"
data:
metadataSecretName: db-connection
connectionsTemplates:
ACCESS_KEY_ID:
kind: secret
name: aws-token
key: AWS_ACCESS_KEY_ID
SECRET_ACCESS_KEY:
kind: secret
name: aws-token
key: AWS_SECRET_ACCESS_KEY
config:
scheduler:
# Let heartbeats be up to 2–5 minutes old without declaring "dead"
scheduler_health_check_threshold: 600
parsing_processes: 2
core:
load_examples: "False"
logging:
remote_logging: "True"
logging_level: "INFO"
remote_log_conn_id: "s3_default"
remote_base_log_folder: "s3://airflow/logs"
encrypt_s3_logs: "False"
ingress:
apiServer:
enabled: true
ingressClassName: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt
nginx.ingress.kubernetes.io/enable-cors: "true"
host: hostname
path: /
pathType: Prefix
tls:
enabled: true
secretName: airflow-tls-secret
workers:
serviceAccount:
create: false
name: airflow-sa
# annotations:
# eks.amazonaws.com/role-arn:
<ENTER_IAM_ROLE_ARN_CREATED_BY_EKSCTL_COMMAND>
resources:
requests:
cpu: 200m
memory: 2Gi
limits:
cpu: 500m
memory: 5Gi
scheduler:
# Remote logging to S3 enabled, so disabled log groomer
# env:
# - name: AIRFLOW__CORE__HOSTNAME_CALLABLE
# value: socket.gethostname
logGroomerSidecar:
enabled: true
# keep short because S3 has the long-term copy
env:
- name: RETENTION_DAYS
value: "3"
command:
- bash
- -ec
- |
echo "Cleaning logs every 900 seconds"
while true; do
echo "Trimming airflow logs to ${RETENTION_DAYS:-3} days."
find /opt/airflow/logs -mindepth 1 -type f -mtime
+${RETENTION_DAYS:-3} -print -delete || true
find /opt/airflow/logs -mindepth 1 -type d -empty -print -delete
|| true
sleep 900
done
# ✅ Tweak startupProbe: give the scheduler time to emit its first
heartbeat
startupProbe:
# command: ["bash","-ec","airflow jobs check --job-type SchedulerJob
--local"]
command:
- /bin/bash
- -c
- airflow jobs check --job-type SchedulerJob --local
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 30
failureThreshold: 10 # loop handles retries
# ✅ Liveness after startup is stable
livenessProbe:
# command: ["bash","-ec","airflow jobs check --job-type SchedulerJob
--local"]
command:
- /bin/bash
- -c
- airflow jobs check --job-type SchedulerJob --local
failureThreshold: 3
periodSeconds: 30
timeoutSeconds: 30
resources:
requests:
cpu: 2
memory: 5Gi
limits:
cpu: 4
memory: 10Gi
dagProcessor:
enabled: true
resources:
requests:
cpu: 1
memory: 10Gi
limits:
cpu: 3
memory: 12Gi
dags:
gitSync:
enabled: true
repo: https://github.com/my-org/dags.git
branch: main
rev: HEAD
depth: 1
maxFailures: 0
subPath: "dags"
credentialsSecret: git-credentials
triggerer:
enabled: true
migrateDatabaseJob:
enabled: true
applyCustomEnv: false
useHelmHooks: false # using Argo CD hooks instead of Helm hooks
jobAnnotations:
argocd.argoproj.io/hook: PreSync
# Keep last hook object around until the next sync so you can inspect
logs
argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
ttlSecondsAfterFinished: null
createUserJob:
useHelmHooks: false
applyCustomEnv: false
### Docker Image customizations
No docker image customizations
### What happened
1. Scheduler doesn't start and is stuck indefinitely, liveness probe fails
eventually.
2. All the other components run fine
3. Also interesting fact is that the liveness probe command runs fine if
manually executed in the pod's container
```
airflow@airflow-aws-production-scheduler-85bf4f4747-9ckpz:/opt/airflow$
airflow jobs check --job-type SchedulerJob --local
Found one alive job.
````
Below is the scheduler pod log:
```
k exec -it airflow-aws-production-scheduler-85bf4f4747-9ckpz /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future
version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "scheduler" out of: scheduler, scheduler-log-groomer,
wait-for-airflow-migrations (init)
airflow@airflow-aws-production-scheduler-85bf4f4747-9ckpz:/opt/airflow$
airflow scheduler -v --stderr err.txt --stdout out.txt
[2025-08-15T14:17:59.463+0000] {providers_manager.py:356} DEBUG -
Initializing Providers Manager[config]
[2025-08-15T14:17:59.465+0000] {providers_manager.py:356} DEBUG -
Initializing Providers Manager[list]
[2025-08-15T14:17:59.662+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.amazon.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-amazon
[2025-08-15T14:17:59.675+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.celery.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-celery
[2025-08-15T14:17:59.679+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.cncf.kubernetes.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-cncf-kubernetes
[2025-08-15T14:17:59.683+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.common.compat.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-common-compat
[2025-08-15T14:17:59.685+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.common.io.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-common-io
[2025-08-15T14:17:59.687+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.common.messaging.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-common-messaging
[2025-08-15T14:17:59.688+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.common.sql.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-common-sql
[2025-08-15T14:17:59.690+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.docker.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-docker
[2025-08-15T14:17:59.692+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.elasticsearch.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-elasticsearch
[2025-08-15T14:17:59.694+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.fab.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-fab
[2025-08-15T14:17:59.697+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.ftp.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-ftp
[2025-08-15T14:17:59.699+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.git.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-git
[2025-08-15T14:17:59.701+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.google.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-google
[2025-08-15T14:17:59.713+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.grpc.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-grpc
[2025-08-15T14:17:59.715+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.hashicorp.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-hashicorp
[2025-08-15T14:17:59.717+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.http.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-http
[2025-08-15T14:17:59.719+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.microsoft.azure.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-microsoft-azure
[2025-08-15T14:17:59.724+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.mysql.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-mysql
[2025-08-15T14:17:59.726+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.odbc.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-odbc
[2025-08-15T14:17:59.727+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.openlineage.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package
apache-airflow-providers-openlineage
[2025-08-15T14:17:59.730+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.postgres.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-postgres
[2025-08-15T14:17:59.732+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.redis.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-redis
[2025-08-15T14:17:59.734+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.sendgrid.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-sendgrid
[2025-08-15T14:17:59.736+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.sftp.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-sftp
[2025-08-15T14:17:59.737+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.slack.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-slack
[2025-08-15T14:17:59.739+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.smtp.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-smtp
[2025-08-15T14:17:59.741+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.snowflake.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-snowflake
[2025-08-15T14:17:59.743+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.ssh.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-ssh
[2025-08-15T14:17:59.744+0000] {providers_manager.py:598} DEBUG - Loading
EntryPoint(name='provider_info',
value='airflow.providers.standard.get_provider_info:get_provider_info',
group='apache_airflow_provider') from package apache-airflow-providers-standard
[2025-08-15T14:17:59.746+0000] {providers_manager.py:359} DEBUG -
Initialization of Providers Manager[list] took 0.28 seconds
[2025-08-15T14:17:59.746+0000] {configuration.py:1871} DEBUG - Loading
providers configuration
[2025-08-15T14:17:59.768+0000] {providers_manager.py:359} DEBUG -
Initialization of Providers Manager[config] took 0.30 seconds
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2025-08-15T14:17:59.889+0000] {plugins_manager.py:353} DEBUG - Loading
plugins
[2025-08-15T14:17:59.889+0000] {plugins_manager.py:269} DEBUG - Loading
plugins from directory: /opt/airflow/plugins
[2025-08-15T14:17:59.889+0000] {plugins_manager.py:249} DEBUG - Loading
plugins from entrypoints
[2025-08-15T14:17:59.890+0000] {plugins_manager.py:252} DEBUG - Importing
entry_point plugin openlineage
[2025-08-15T14:18:00.165+0000] {plugins_manager.py:365} DEBUG - Loading 1
plugin(s) took 275.59 seconds
[2025-08-15T14:18:00.165+0000] {listener.py:37} DEBUG - Calling
'on_starting' with {'component': <airflow.jobs.job.Job object at
0x7fd2d27c2d50>}
[2025-08-15T14:18:00.165+0000] {listener.py:38} DEBUG - Hook impls: []
[2025-08-15T14:18:00.166+0000] {listener.py:42} DEBUG - Result from
'on_starting': []
[2025-08-15T14:18:00.184+0000] {scheduler_job_runner.py:996} INFO - Starting
the scheduler
[2025-08-15T14:18:00.185+0000] {scheduler_job_runner.py:1006} DEBUG - Using
DatabaseCallbackSink as callback sink.
[2025-08-15T14:18:00.185+0000] {executor_loader.py:257} DEBUG - Loading
executor :KubernetesExecutor: from core
```
### What you think should happen instead
_No response_
### How to reproduce
Install the helm chart 1.18.0 with KubernetesExecutor
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]