GitHub user Urus1201 added a comment to the discussion: Is the `DAG Processor` needed if our dags are static?
The DAG Processor is still needed even for static DAGs — but you can make it nearly invisible. Here's why, and the right config to use: ## Why you still need it Airflow's scheduler does **not** read DAG Python files directly — it reads from **serialized DAGs** stored in the metastore. The DAG Processor's job is to: 1. Parse DAG files and write their serialized form to the DB 2. Detect newly added or removed DAGs 3. Keep `dag_run` and `task_instance` records in sync with the DAG structure Without it, the scheduler has no DAG definitions to work with (even if your DAGs never change). ## Can you set `parsing_processes: 0`? In Airflow 3, setting `parsing_processes: 0` **disables** DAG file parsing entirely. This means: - No new DAGs are discovered after startup - No DAG structure changes are reflected - If the metastore already has serialized DAGs from a previous parse, they continue to run For a strict "update only on new release" workflow this is technically viable, but then you **must** trigger a processor restart (or set `parsing_processes` back to a non-zero value briefly) as part of your release process to re-parse the updated DAGs. ## Recommended configuration for static DAGs ```ini [dag_processor] # Re-parse at most once per hour (DAGs never change between releases) min_file_process_interval = 3600 # Only re-scan for new/removed files every 10 minutes refresh_interval = 600 # Keep stale threshold high (don't mark DAGs stale too quickly) stale_dag_threshold = 7200 # Only scan files modified since last scan (huge win for static setups) file_parsing_sort_mode = modified_time # Use fewer processes since reparsing is rare parsing_processes = 2 [core] # Skip checking imports at discovery time — safe for known-good static DAGs dag_discovery_safe_mode = False ``` With `file_parsing_sort_mode = modified_time`, Airflow will only re-parse files whose `mtime` has changed since the last scan. For truly static DAGs, this means almost zero re-parsing overhead between releases — just the initial parse at startup. ## In your Helm chart release workflow Since you rebuild the Docker image on each release (new `mtime` on all files), the processor will re-parse everything on the next pod start. This is exactly the right behavior: parse once at startup, then idle until the next release. GitHub link: https://github.com/apache/airflow/discussions/64287#discussioncomment-16503698 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
