GitHub user shivshav edited a discussion: `{pod_generator.py:477} WARNING - 
Model file  does not exist` output by every task

# Official Helm Chart version
1.6.0 (upgrade to 1.15.0 pending after we are confident in how the switch to 
`KubernetesExecutor` works).

# Apache Airflow version
2.10.4

# Kubernetes Version
1.23

# Helm Chart configuration
```yaml
  airflowHome: /opt/airflow
  airflowLocalSettings: |-
    # Note: This is all just Python code. Because YAML and Python both care 
about indentation, you're probably better off
    # copying and pasting this from a .py file rather than editing in-line for 
most bigger changes
    import logging

    from airflow.exceptions import AirflowClusterPolicyViolation
    from airflow.models import BaseOperator, TaskInstance
    import kubernetes.client.models as k8s

    leafly_task_resource_sizes = {
        "medium": k8s.V1ResourceRequirements(
            requests={
                "cpu": "400m",
                "memory": "1.5Gi",
            },
            limits={
                "cpu": "400m",
                "memory": "1.5Gi",
            }
        ),
        "large": k8s.V1ResourceRequirements(
            requests={
                "cpu": "1.5",
                "memory": "6Gi",
            },
            limits={
                "cpu": "1.5",
                "memory": "6Gi",
            }
        ),
    }

    def task_policy(task: BaseOperator):
        task_size = task.params.get("task_size")

        logger = logging.getLogger(__name__)
        logger.info("task_policy hook")
        if not task_size: # fall back to default worker size
            return
        if leafly_task_resource_sizes.get(task_size) is None:
            raise AirflowClusterPolicyViolation(f"task size '{task_size}' is 
not supported")
        task.executor_config = {
            "pod_override": k8s.V1Pod(
                metadata=k8s.V1ObjectMeta(
                    labels={
                        "airflow.k8s.leafly.io/task-size": task_size,
                    }
                ),
                spec=k8s.V1PodSpec(
                    containers=[
                        k8s.V1Container(
                            name="base",
                            resources=leafly_task_resource_sizes[task_size],
                        )
                    ]
                )
            )
        }
  airflowVersion: 2.10.4
  allowPodLaunching: true
  cleanup:
    enabled: false
  config:
    api:
      auth_backends: airflow.api.auth.backend.default
    celery:
      worker_concurrency: 16
    core:
      dags_are_paused_at_creation: "True"
      dags_folder: '{{ include "airflow_dags" . }}'
      donot_pickle: "True"
      encrypt_s3_logs: "False"
      execute_tasks_new_python_interpreter: "True"
      executor: '{{ .Values.executor }}'
      hide_sensitive_var_conn_fields: "True"
      hostname_callable: airflow.utils.net.get_host_ip_address
      load_examples: "False"
      parallelism: 16
      remote_log_conn_id: aws_s3
      remote_logging: "True"
    database:
      sql_alchemy_pool_recycle: 3600
    elasticsearch: null
    elasticsearch_configs: null
    kerberos: null
    kubernetes: null
    kubernetes_executor:
      worker_pods_creation_batch_size: 4
    logging:
      colored_console_log: "False"
      encrypt_s3_logs: "False"
      fab_logging_level: WARN
      logging_level: INFO
      remote_base_log_folder: s3://<REDACTED>
      remote_log_conn_id: aws_s3
      remote_logging: "True"
    metrics:
      statsd_host: $STATSD_HOST
      statsd_on: "True"
      statsd_port: 8125
      statsd_prefix: airflow
    scheduler:
      run_duration: -1
      task_queued_timeout: 600
    secrets:
      backend: airflow.providers.hashicorp.secrets.vault.VaultBackend
      backend_kwargs: '{"connections_path": "connections", "variables_path": 
"variables",
        "mount_point": "airflow", "url": "<REDACTED>", "auth_type": 
"kubernetes",
        "kubernetes_role": "airflow"}'
    sensors:
      default_timeout: 3600
    webserver:
      enable_proxy_fix: "True"
      expose_config: "True"
      rbac: "True"
  dags:
    gitSync:
      branch: master
      depth: 1
      enabled: true
      knownHosts: |
        github.com ssh-ed25519 
AAAAC3NzaC1lZDI1NTE5AAAAIOMqqnkVzrm0SdG6UOoqKLsabgH5C9okWi0dh2l9GKJl
        github.com ecdsa-sha2-nistp256 
AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEmKSENjQEezOmxkZMy7opKgwFB9nkt5YRrYMjNuG5N87uRgg6CLrbo5wAdT/y6v0mKV0U2w0WZ2YB/++Tpockg=
        github.com ssh-rsa 
AAAAB3NzaC1yc2EAAAADAQABAAABgQCj7ndNxQowgcQnjshcLrqPEiiphnt+VTTvDP6mHBL9j1aNUkY4Ue1gvwnGLVlOhGeYrnZaMgRK6+PKCUXaDbC7qtbW8gIkhL7aGCsOr/C56SJMy/BCZfxd1nWzAOxSDPgVsmerOBYfNqltV9/hWCqBywINIR+5dIg6JTJ72pcEpEjcYgXkE2YEFXV1JHnsKgbLWNlhScqb2UmyRkQyytRLtL+38TGxkxCflmO+5Z8CSSNY7GidjMIZ7Q4zMjA2n1nGrlTDkzwDCsw+wqFPGQA179cnfGWOWRVruj16z6XyvxvjJwbz0wQZ75XK5tKSb7FNyeIEs4TT4jk+S4dhPeAUC5y+bDYirYgM4GC7uEnztnZyaVWQ7B381AK4Qdrwt51ZqExKbQpTUNn+EjqoTwvqNj4kqx5QUCI0ThS/YkOxJCXmPUWZbhjpCg56i+2aB6CmK2JGhn57K5mj0MNdBXA4/WnwH6XoPWJzK5Nyu2zB3nAZp+S5hpQs+p1vN1/wsjk=
      maxFailures: 5
      repo: <REDACTED>
      resources:
        limits:
          memory: 256Mi
        requests:
          cpu: 10m
          memory: 128Mi
      rev: HEAD
      sshKeySecret: airflow
      subPath: ""
    persistence:
      enabled: false
  data:
    brokerUrl: <REDACTED>
    metadataSecretName: airflow-metadata-db-url
    resultBackendSecretName: airflow-results-db-url
  defaultAirflowRepository: <REDACTED>
  defaultAirflowTag: <REDACTED>
  elasticsearch:
    enabled: false
  executor: KubernetesExecutor
  extraEnv: |
    - name: STATSD_HOST
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.hostIP
  fernetKeySecretName: airflow
  flower:
    enabled: true
    resources:
      limits:
        memory: 512Mi
      requests:
        cpu: 20m
        memory: 256Mi
    serviceAccount:
      create: true
  images:
    airflow:
      repository: null
      tag: null
    flower:
      pullPolicy: IfNotPresent
      repository: null
      tag: null
    pod_template:
      pullPolicy: IfNotPresent
      repository: null
      tag: null
  ingress:
    enabled: true
    flower:
      annotations:
        ingress.kubernetes.io/force-ssl-redirect: "true"
        ingress.kubernetes.io/rewrite-target: /
        kubernetes.io/ingress.class: nginx-internal
      host: <REDACTED>
    web:
      annotations:
        ingress.kubernetes.io/force-ssl-redirect: "true"
        kubernetes.io/ingress.class: nginx-internal
      host: <REDACTED>
  labels:
    tags.datadoghq.com/env: production
    tags.datadoghq.com/service: airflow
  logs:
    persistence:
      enabled: false
  migrateDatabaseJob:
    jobAnnotations:
      argocd.argoproj.io/hook: PreSync
    serviceAccount:
      annotations:
        argocd.argoproj.io/hook: PreSync
      create: true
  multiNamespaceMode: false
  pgbouncer:
    enabled: false
  postgresql:
    enabled: false
  rbac:
    create: false
  redis:
    enabled: false
  scheduler:
    livenessProbe:
      command:
      - sh
      - -c
      - |
        CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec 
/entrypoint \
        airflow jobs check --job-type SchedulerJob --hostname $(hostname -i)
    logGroomerSidecar:
      enabled: true
      resources:
        limits:
          memory: 256Mi
        requests:
          cpu: 10m
          memory: 64Mi
    nodeSelector:
      lifecycle: Spot
    podAnnotations:
      ad.datadoghq.com/scheduler.logs: '[{"source": "airflow", "service": 
"airflow"}]'
    replicas: 1
    resources:
      limits:
        cpu: 2
        memory: 6Gi
      requests:
        cpu: 2
        memory: 2Gi
    safeToEvict: true
    serviceAccount:
      create: true
    tolerations:
    - key: spot
      operator: Equal
      value: "true"
    - key: compute
      operator: Equal
      value: "true"
  statsd:
    enabled: false
  triggerer:
    livenessProbe:
      command:
      - sh
      - -c
      - |
        CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec 
/entrypoint \
        airflow jobs check --job-type TriggererJob --hostname $(hostname -i)
  webserver:
    allowPodLogReading: true
    defaultUser:
      enabled: false
    podAnnotations:
      ad.datadoghq.com/webserver.logs: '[{"source": "airflow", "service": 
"airflow"}]'
    replicas: 1
    resources:
      limits:
        memory: 3Gi
      requests:
        cpu: 500m
        memory: 1Gi
    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: <REDACTED>
        eks.amazonaws.com/sts-regional-endpoints: "true"
        eks.amazonaws.com/token-expiration: "86400"
      create: true
  webserverSecretKeySecretName: airflow
  workers:
    keda:
      enabled: false
    logGroomerSidecar:
      resources: {}
    nodeSelector:
      lifecycle: Spot
    persistence:
      enabled: false
    podAnnotations:
      ad.datadoghq.com/worker.logs: '[{"source": "airflow", "service": 
"airflow"}]'
    resources:
      limits:
        cpu: 1
        memory: 6Gi
      requests:
        cpu: 200m
        memory: 1Gi
    safeToEvict: false
    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: <REDACTED>
        eks.amazonaws.com/sts-regional-endpoints: "true"
        eks.amazonaws.com/token-expiration: "86400"
      create: true
    strategy:
      rollingUpdate:
        maxSurge: 100%
        maxUnavailable: 50%
    tolerations:
    - key: spot
      operator: Equal
      value: "true"
    - key: compute
      operator: Equal
      value: "true"
    updateStrategy: null
```

# What happened
We recently switched to the `KubernetesExecutor` (still in-process, which is 
why some `CeleryExecutor` configuration options are still present). On every 
task we run, we get the above-mentioned log output that looks like this. I've 
added the surrounding messages from the same task pod as I know that was 
requested previously on similar issues

```
[2025-02-27T18:05:47.971+0000] {standard_task_runner.py:105} INFO - Job 
2147416: Subtask <REDACTED>
[2025-02-27T18:05:48.075+0000] {task_command.py:467} INFO - Running 
<TaskInstance: <REDACTED> scheduled__2025-02-27T17:30:00+00:00 [running]> on 
host 100.99.130.23
[2025-02-27T18:05:48.395+0000] {pod_generator.py:477} WARNING - Model file  
does not exist
[2025-02-27T18:05:48.441+0000] {taskinstance.py:3132} INFO - Exporting env 
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='<REDACTED>' 
AIRFLOW_CTX_TASK_ID='<REDACTED>' 
AIRFLOW_CTX_EXECUTION_DATE='2025-02-27T17:30:00+00:00' 
AIRFLOW_CTX_TRY_NUMBER='1' 
AIRFLOW_CTX_DAG_RUN_ID='scheduled__2025-02-27T17:30:00+00:00'
...
```

The tasks complete successfully so there's effectively no issue, we just have 
these warnings happening for every task that runs which is adding to our log 
volume. In tracing it down, it seems it's likely [this 
line](https://github.com/apache/airflow/blob/c083e456fa02c6cb32cdbe0c9ed3c3b2380beccd/airflow/providers/cncf/kubernetes/pod_generator.py#L559)?

It's worth noting that there's an extra space between `file` and `does` in the 
`Model file  does not exist line`. That makes me think the aforementioned 
function is being called with an empty argument most likely.

# What you think should happen instead
The warning should not be printed at all.

# Deployment
Official Apache Airflow Helm Chart

# Deployment Details
Deployed on self-managed cluster in AWS

# Anything else
Initially I suspected it was because the worker pods **do not** have the 
`pod_template_file.yaml` mounted in their directories, so I added that using 
`workers.extraVolumeMounts` in the Helm chart, but that didn't seem to help. 

Additionally, there's a few similar discussions I've run across:
- https://github.com/apache/airflow/discussions/32043
- https://github.com/apache/airflow/discussions/35419

One of these seem to allude to the fact that Airflow is possibly looking for a 
custom pod_template_file that's overridden per task/in the executor config? The 
key difference here being there's no path being output for our message so I'm 
not sure what it would be looking for or why.

Furthermore, I've been able to exec onto the pods and `cat`/`ls`/`cd` the 
pod_template_file I mounted and the directory it lives in as the default 
`airflow` user, so I don't believe this is a file permissions issue either.

Our Kubernetes provider version: 
`apache-airflow-providers-cncf-kubernetes==10.0.1`

Please let me know if there's additional information I can provide, thank you 
in advance for any help 🙏🏾 

# Are you willing to submit a PR?
- [X] If it is a Helm configuration issue or a fairly simple Python change, yes.

# Code of Conduct
- [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)

GitHub link: https://github.com/apache/airflow/discussions/47166

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to