kaxil opened a new pull request, #67919:
URL: https://github.com/apache/airflow/pull/67919

   The triggerer's supervisor already serves the runtime requests its triggers 
make (connections, variables, XComs, task states) through the Execution API, 
but today it does so in-process against the metadata database. This change 
points that client at the api-server over HTTP and moves trigger orchestration 
onto the API as well, so the triggerer process holds no metadata-database 
connection: it registers and heartbeats a liveness Job, claims triggers, 
fetches their `RunTrigger` workloads, and reports events, failures, and cleanup 
over the API. The api-server becomes the only process that touches the 
database. Implements the triggerer portion of AIP-92.
   
   ## What it adds
   
   New Execution API endpoints, gated behind a new API version:
   
   - `POST /execution/triggers/load`, `/workloads`, `/{id}/event`, 
`/{id}/failure`, `/cleanup`
   - `POST /execution/jobs`, `/jobs/{id}/heartbeat`, `/jobs/{id}/complete`
   
   The workload-building logic is shared between the `/workloads` endpoint and 
the in-process builder, so the server and the triggerer produce identical 
workloads.
   
   ## Design notes
   
   - **Event payloads are not deserialized in the api-server.** A fired event's 
payload is serde-encoded by the triggerer and spliced into the task instance's 
`next_kwargs` without being deserialized server-side, so the trusted api-server 
never runs `import_string` on a worker-supplied body. The worker deserializes 
it on resume. Asset and callback consumers, which need the live object, 
deserialize via serde, which is allowlist-gated by `[core] 
allowed_deserialization_classes`.
   - **Resilient run loop.** A transient API error re-queues the event or 
failure and retries next cycle rather than tearing down all running triggers; a 
non-transient 4xx surfaces loudly instead of looping silently at WARNING.
   - **No metadata Job row written by the triggerer.** The command runs the job 
runner directly instead of through `run_job` (whose `prepare_for_execution` 
would write a Job row at startup) and skips the startup database checks. The 
liveness Job is registered, heartbeated, and finalized over the API instead.
   - **Trigger logs stay retrievable.** The supervisor registers a per-trigger 
logging factory for each fetched workload, keys the log filename on the 
registered Job id so it matches what the reader looks for, and records the 
triggerer's own hostname on the Job.
   
   ## Gotchas
   
   - The wire carries the event payload but not its event type, so a 
`start_from_trigger` trigger that emits a `BaseTaskEndEvent` is currently 
handled as a generic resume rather than a terminal event. Worth a follow-up.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to