kaxil opened a new pull request, #67919:
URL: https://github.com/apache/airflow/pull/67919
The triggerer's supervisor already serves the runtime requests its triggers
make (connections, variables, XComs, task states) through the Execution API,
but today it does so in-process against the metadata database. This change
points that client at the api-server over HTTP and moves trigger orchestration
onto the API as well, so the triggerer process holds no metadata-database
connection: it registers and heartbeats a liveness Job, claims triggers,
fetches their `RunTrigger` workloads, and reports events, failures, and cleanup
over the API. The api-server becomes the only process that touches the
database. Implements the triggerer portion of AIP-92.
## What it adds
New Execution API endpoints, gated behind a new API version:
- `POST /execution/triggers/load`, `/workloads`, `/{id}/event`,
`/{id}/failure`, `/cleanup`
- `POST /execution/jobs`, `/jobs/{id}/heartbeat`, `/jobs/{id}/complete`
The workload-building logic is shared between the `/workloads` endpoint and
the in-process builder, so the server and the triggerer produce identical
workloads.
## Design notes
- **Event payloads are not deserialized in the api-server.** A fired event's
payload is serde-encoded by the triggerer and spliced into the task instance's
`next_kwargs` without being deserialized server-side, so the trusted api-server
never runs `import_string` on a worker-supplied body. The worker deserializes
it on resume. Asset and callback consumers, which need the live object,
deserialize via serde, which is allowlist-gated by `[core]
allowed_deserialization_classes`.
- **Resilient run loop.** A transient API error re-queues the event or
failure and retries next cycle rather than tearing down all running triggers; a
non-transient 4xx surfaces loudly instead of looping silently at WARNING.
- **No metadata Job row written by the triggerer.** The command runs the job
runner directly instead of through `run_job` (whose `prepare_for_execution`
would write a Job row at startup) and skips the startup database checks. The
liveness Job is registered, heartbeated, and finalized over the API instead.
- **Trigger logs stay retrievable.** The supervisor registers a per-trigger
logging factory for each fetched workload, keys the log filename on the
registered Job id so it matches what the reader looks for, and records the
triggerer's own hostname on the Job.
## Gotchas
- The wire carries the event payload but not its event type, so a
`start_from_trigger` trigger that emits a `BaseTaskEndEvent` is currently
handled as a generic resume rather than a terminal event. Worth a follow-up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]