GitHub user shivshav edited a discussion: `{pod_generator.py:477} WARNING -
Model file does not exist` output by every task
# Official Helm Chart version
1.6.0 (upgrade to 1.15.0 pending after we are confident in how the switch to
`KubernetesExecutor` works).
# Apache Airflow version
2.10.4
# Kubernetes Version
1.23
# Helm Chart configuration
```yaml
airflowHome: /opt/airflow
airflowLocalSettings: |-
# Note: This is all just Python code. Because YAML and Python both care
about indentation, you're probably better off
# copying and pasting this from a .py file rather than editing in-line for
most bigger changes
import logging
from airflow.exceptions import AirflowClusterPolicyViolation
from airflow.models import BaseOperator, TaskInstance
import kubernetes.client.models as k8s
leafly_task_resource_sizes = {
"medium": k8s.V1ResourceRequirements(
requests={
"cpu": "400m",
"memory": "1.5Gi",
},
limits={
"cpu": "400m",
"memory": "1.5Gi",
}
),
"large": k8s.V1ResourceRequirements(
requests={
"cpu": "1.5",
"memory": "6Gi",
},
limits={
"cpu": "1.5",
"memory": "6Gi",
}
),
}
def task_policy(task: BaseOperator):
task_size = task.params.get("task_size")
logger = logging.getLogger(__name__)
logger.info("task_policy hook")
if not task_size: # fall back to default worker size
return
if leafly_task_resource_sizes.get(task_size) is None:
raise AirflowClusterPolicyViolation(f"task size '{task_size}' is
not supported")
task.executor_config = {
"pod_override": k8s.V1Pod(
metadata=k8s.V1ObjectMeta(
labels={
"airflow.k8s.leafly.io/task-size": task_size,
}
),
spec=k8s.V1PodSpec(
containers=[
k8s.V1Container(
name="base",
resources=leafly_task_resource_sizes[task_size],
)
]
)
)
}
airflowVersion: 2.10.4
allowPodLaunching: true
cleanup:
enabled: false
config:
api:
auth_backends: airflow.api.auth.backend.default
celery:
worker_concurrency: 16
core:
dags_are_paused_at_creation: "True"
dags_folder: '{{ include "airflow_dags" . }}'
donot_pickle: "True"
encrypt_s3_logs: "False"
execute_tasks_new_python_interpreter: "True"
executor: '{{ .Values.executor }}'
hide_sensitive_var_conn_fields: "True"
hostname_callable: airflow.utils.net.get_host_ip_address
load_examples: "False"
parallelism: 16
remote_log_conn_id: aws_s3
remote_logging: "True"
database:
sql_alchemy_pool_recycle: 3600
elasticsearch: null
elasticsearch_configs: null
kerberos: null
kubernetes: null
kubernetes_executor:
worker_pods_creation_batch_size: 4
logging:
colored_console_log: "False"
encrypt_s3_logs: "False"
fab_logging_level: WARN
logging_level: INFO
remote_base_log_folder: s3://<REDACTED>
remote_log_conn_id: aws_s3
remote_logging: "True"
metrics:
statsd_host: $STATSD_HOST
statsd_on: "True"
statsd_port: 8125
statsd_prefix: airflow
scheduler:
run_duration: -1
task_queued_timeout: 600
secrets:
backend: airflow.providers.hashicorp.secrets.vault.VaultBackend
backend_kwargs: '{"connections_path": "connections", "variables_path":
"variables",
"mount_point": "airflow", "url": "<REDACTED>", "auth_type":
"kubernetes",
"kubernetes_role": "airflow"}'
sensors:
default_timeout: 3600
webserver:
enable_proxy_fix: "True"
expose_config: "True"
rbac: "True"
dags:
gitSync:
branch: master
depth: 1
enabled: true
knownHosts: |
github.com ssh-ed25519
AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl
github.com ecdsa-sha2-nistp256
AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
github.com ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABgQCj7ndNxQowgcQnjshcLrqPEiiphnt+VTTvDP6mHBL9j1aNUkY4Ue1gvwnGLVlOhGeYrnZaMgRK6+PKCUXaDbC7qtbW8gIkhL7aGCsOr/C56SJMy/BCZfxd1nWzAOxSDPgVsmerOBYfNqltV9/hWCqBywINIR+5dIg6JTJ72pcEpEjcYgXkE2YEFXV1JHnsKgbLWNlhScqb2UmyRkQyytRLtL+38TGxkxCflmO+5Z8CSSNY7GidjMIZ7Q4zMjA2n1nGrlTDkzwDCsw+wqFPGQA179cnfGWOWRVruj16z6XyvxvjJwbz0wQZ75XK5tKSb7FNyeIEs4TT4jk+S4dhPeAUC5y+bDYirYgM4GC7uEnztnZyaVWQ7B381AK4Qdrwt51ZqExKbQpTUNn+EjqoTwvqNj4kqx5QUCI0ThS/YkOxJCXmPUWZbhjpCg56i+2aB6CmK2JGhn57K5mj0MNdBXA4/WnwH6XoPWJzK5Nyu2zB3nAZp+S5hpQs+p1vN1/wsjk=
maxFailures: 5
repo: <REDACTED>
resources:
limits:
memory: 256Mi
requests:
cpu: 10m
memory: 128Mi
rev: HEAD
sshKeySecret: airflow
subPath: ""
persistence:
enabled: false
data:
brokerUrl: <REDACTED>
metadataSecretName: airflow-metadata-db-url
resultBackendSecretName: airflow-results-db-url
defaultAirflowRepository: <REDACTED>
defaultAirflowTag: <REDACTED>
elasticsearch:
enabled: false
executor: KubernetesExecutor
extraEnv: |
- name: STATSD_HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
fernetKeySecretName: airflow
flower:
enabled: true
resources:
limits:
memory: 512Mi
requests:
cpu: 20m
memory: 256Mi
serviceAccount:
create: true
images:
airflow:
repository: null
tag: null
flower:
pullPolicy: IfNotPresent
repository: null
tag: null
pod_template:
pullPolicy: IfNotPresent
repository: null
tag: null
ingress:
enabled: true
flower:
annotations:
ingress.kubernetes.io/force-ssl-redirect: "true"
ingress.kubernetes.io/rewrite-target: /
kubernetes.io/ingress.class: nginx-internal
host: <REDACTED>
web:
annotations:
ingress.kubernetes.io/force-ssl-redirect: "true"
kubernetes.io/ingress.class: nginx-internal
host: <REDACTED>
labels:
tags.datadoghq.com/env: production
tags.datadoghq.com/service: airflow
logs:
persistence:
enabled: false
migrateDatabaseJob:
jobAnnotations:
argocd.argoproj.io/hook: PreSync
serviceAccount:
annotations:
argocd.argoproj.io/hook: PreSync
create: true
multiNamespaceMode: false
pgbouncer:
enabled: false
postgresql:
enabled: false
rbac:
create: false
redis:
enabled: false
scheduler:
livenessProbe:
command:
- sh
- -c
- |
CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec
/entrypoint \
airflow jobs check --job-type SchedulerJob --hostname $(hostname -i)
logGroomerSidecar:
enabled: true
resources:
limits:
memory: 256Mi
requests:
cpu: 10m
memory: 64Mi
nodeSelector:
lifecycle: Spot
podAnnotations:
ad.datadoghq.com/scheduler.logs: '[{"source": "airflow", "service":
"airflow"}]'
replicas: 1
resources:
limits:
cpu: 2
memory: 6Gi
requests:
cpu: 2
memory: 2Gi
safeToEvict: true
serviceAccount:
create: true
tolerations:
- key: spot
operator: Equal
value: "true"
- key: compute
operator: Equal
value: "true"
statsd:
enabled: false
triggerer:
livenessProbe:
command:
- sh
- -c
- |
CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec
/entrypoint \
airflow jobs check --job-type TriggererJob --hostname $(hostname -i)
webserver:
allowPodLogReading: true
defaultUser:
enabled: false
podAnnotations:
ad.datadoghq.com/webserver.logs: '[{"source": "airflow", "service":
"airflow"}]'
replicas: 1
resources:
limits:
memory: 3Gi
requests:
cpu: 500m
memory: 1Gi
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: <REDACTED>
eks.amazonaws.com/sts-regional-endpoints: "true"
eks.amazonaws.com/token-expiration: "86400"
create: true
webserverSecretKeySecretName: airflow
workers:
keda:
enabled: false
logGroomerSidecar:
resources: {}
nodeSelector:
lifecycle: Spot
persistence:
enabled: false
podAnnotations:
ad.datadoghq.com/worker.logs: '[{"source": "airflow", "service":
"airflow"}]'
resources:
limits:
cpu: 1
memory: 6Gi
requests:
cpu: 200m
memory: 1Gi
safeToEvict: false
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: <REDACTED>
eks.amazonaws.com/sts-regional-endpoints: "true"
eks.amazonaws.com/token-expiration: "86400"
create: true
strategy:
rollingUpdate:
maxSurge: 100%
maxUnavailable: 50%
tolerations:
- key: spot
operator: Equal
value: "true"
- key: compute
operator: Equal
value: "true"
updateStrategy: null
```
# What happened
We recently switched to the `KubernetesExecutor` (still in-process, which is
why some `CeleryExecutor` configuration options are still present). On every
task we run, we get the above-mentioned log output that looks like this. I've
added the surrounding messages from the same task pod as I know that was
requested previously on similar issues
```
[2025-02-27T18:05:47.971+0000] {standard_task_runner.py:105} INFO - Job
2147416: Subtask <REDACTED>
[2025-02-27T18:05:48.075+0000] {task_command.py:467} INFO - Running
<TaskInstance: <REDACTED> scheduled__2025-02-27T17:30:00+00:00 [running]> on
host 100.99.130.23
[2025-02-27T18:05:48.395+0000] {pod_generator.py:477} WARNING - Model file
does not exist
[2025-02-27T18:05:48.441+0000] {taskinstance.py:3132} INFO - Exporting env
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='<REDACTED>'
AIRFLOW_CTX_TASK_ID='<REDACTED>'
AIRFLOW_CTX_EXECUTION_DATE='2025-02-27T17:30:00+00:00'
AIRFLOW_CTX_TRY_NUMBER='1'
AIRFLOW_CTX_DAG_RUN_ID='scheduled__2025-02-27T17:30:00+00:00'
...
```
The tasks complete successfully so there's effectively no issue, we just have
these warnings happening for every task that runs which is adding to our log
volume. In tracing it down, it seems it's likely [this
line](https://github.com/apache/airflow/blob/c083e456fa02c6cb32cdbe0c9ed3c3b2380beccd/airflow/providers/cncf/kubernetes/pod_generator.py#L559)?
It's worth noting that there's an extra space between `file` and `does` in the
`Model file does not exist line`. That makes me think the aforementioned
function is being called with an empty argument most likely.
# What you think should happen instead
The warning should not be printed at all.
# Deployment
Official Apache Airflow Helm Chart
# Deployment Details
Deployed on self-managed cluster in AWS
# Anything else
Initially I suspected it was because the worker pods **do not** have the
`pod_template_file.yaml` mounted in their directories, so I added that using
`workers.extraVolumeMounts` in the Helm chart, but that didn't seem to help.
Additionally, there's a few similar discussions I've run across:
- https://github.com/apache/airflow/discussions/32043
- https://github.com/apache/airflow/discussions/35419
One of these seem to allude to the fact that Airflow is possibly looking for a
custom pod_template_file that's overridden per task/in the executor config? The
key difference here being there's no path being output for our message so I'm
not sure what it would be looking for or why.
Furthermore, I've been able to exec onto the pods and `cat`/`ls`/`cd` the
pod_template_file I mounted and the directory it lives in as the default
`airflow` user, so I don't believe this is a file permissions issue either.
Our Kubernetes provider version:
`apache-airflow-providers-cncf-kubernetes==10.0.1`
Please let me know if there's additional information I can provide, thank you
in advance for any help 🙏🏾
# Are you willing to submit a PR?
- [X] If it is a Helm configuration issue or a fairly simple Python change, yes.
# Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
GitHub link: https://github.com/apache/airflow/discussions/47166
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]