kaxil opened a new pull request, #67878:
URL: https://github.com/apache/airflow/pull/67878
Once this PR is merged, the standalone DAG processor (`airflow
dag-processor`) no longer connects to the metadata
database directly. It persists parse results and reads all metadata through
the API server, the
same way workers already operate. This removes one of the last few
components that
runs user-adjacent code while also holding a direct database connection.
Persistence (serialized DAGs, import errors, warnings), stale-DAG and
orphaned-import-error
reconciliation, bundle sync and state, priority-parse-request and callback
claiming, and the
processor's own `Job` liveness record all go through a new `/dag-processing`
API app. Parse-time
and bundle-initialization `Connection`/`Variable` reads resolve through the
Execution API.
## What changed
- New `/dag-processing` FastAPI sub-app mounted on the API server
(`airflow.api_fastapi.dag_processing`), split into `app.py` (routes),
`datamodels.py`, and
`security.py`.
- New `DagProcessingApiClient` (httpx) used by the processor: pooled, with
bounded retry/backoff
and a startup readiness wait.
- `DagFileProcessorManager` routes all persistence and metadata reads
through the client.
Bundle-initialization credentials resolve through the Execution API (the
same path workers and
triggerers use), so a git connection stored in the metadata database keeps
working without
direct DB access.
- New config `[core] dag_processing_api_server_url` (defaults to the
`/dag-processing` mount of
the configured API server) and `[dag_processor] jwt_audience`.
## Breaking change
The direct-database path is removed: the DAG processor now requires a
reachable API server that
mounts the `dag-processing` app (`airflow api-server --apps all`, or include
`dag-processing`). A
deployment that previously ran `airflow dag-processor` with only a database
connection must now
also run the API server. See the newsfragment.
## Design notes
- **Auth.** The processor self-signs a token for `[dag_processor]
jwt_audience` with the
deployment signing key, and the endpoints validate it via `JWTBearer`.
Validation goes through
the same `get_sig_validation_args` path as the Execution API, so a
deployment that configures
`[api_auth] trusted_jwks_url` validates externally-issued tokens for
`/dag-processing` exactly
as it does for `/execution`. `/health` stays unauthenticated for readiness
probes.
- **Resilience.** Per-loop API calls are guarded so a transient API outage
skips a cycle instead
of crashing the processor, the heartbeat is throttled, and startup waits
for API readiness.
## Config
```ini
[core]
# optional; defaults to the /dag-processing sibling mount of
execution_api_server_url
dag_processing_api_server_url = http://api-server:8080/dag-processing
[dag_processor]
# optional; mirrors [execution_api] jwt_audience
jwt_audience = urn:airflow.apache.org:dag-processing
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]