Re: [D] Is the `DAG Processor` needed if our dags are static? [airflow]

via GitHub Thu, 09 Apr 2026 06:08:12 -0700


GitHub user Urus1201 added a comment to the discussion: Is the `DAG Processor` 
needed if our dags are static?


The DAG Processor is still needed even for static DAGs — but you can make it 
nearly invisible. Here's why, and the right config to use:

## Why you still need it

Airflow's scheduler does **not** read DAG Python files directly — it reads from 
**serialized DAGs** stored in the metastore. The DAG Processor's job is to:
1. Parse DAG files and write their serialized form to the DB
2. Detect newly added or removed DAGs
3. Keep `dag_run` and `task_instance` records in sync with the DAG structure

Without it, the scheduler has no DAG definitions to work with (even if your 
DAGs never change).

## Can you set `parsing_processes: 0`?

In Airflow 3, setting `parsing_processes: 0` **disables** DAG file parsing 
entirely. This means:
- No new DAGs are discovered after startup
- No DAG structure changes are reflected
- If the metastore already has serialized DAGs from a previous parse, they 
continue to run

For a strict "update only on new release" workflow this is technically viable, 
but then you **must** trigger a processor restart (or set `parsing_processes` 
back to a non-zero value briefly) as part of your release process to re-parse 
the updated DAGs.

## Recommended configuration for static DAGs

```ini
[dag_processor]
# Re-parse at most once per hour (DAGs never change between releases)
min_file_process_interval = 3600

# Only re-scan for new/removed files every 10 minutes
refresh_interval = 600

# Keep stale threshold high (don't mark DAGs stale too quickly)
stale_dag_threshold = 7200

# Only scan files modified since last scan (huge win for static setups)
file_parsing_sort_mode = modified_time

# Use fewer processes since reparsing is rare
parsing_processes = 2

[core]
# Skip checking imports at discovery time — safe for known-good static DAGs
dag_discovery_safe_mode = False
```

With `file_parsing_sort_mode = modified_time`, Airflow will only re-parse files 
whose `mtime` has changed since the last scan. For truly static DAGs, this 
means almost zero re-parsing overhead between releases — just the initial parse 
at startup.

## In your Helm chart release workflow

Since you rebuild the Docker image on each release (new `mtime` on all files), 
the processor will re-parse everything on the next pod start. This is exactly 
the right behavior: parse once at startup, then idle until the next release.

GitHub link: 
https://github.com/apache/airflow/discussions/64287#discussioncomment-16503698

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Is the `DAG Processor` needed if our dags are static? [airflow]

Reply via email to