shivaam opened a new pull request, #64085: URL: https://github.com/apache/airflow/pull/64085
When a deferred AWS task hits a terminal failure state, `async_wait()` raises `AirflowException` with the error details. But `AwsBaseWaiterTrigger.run()` did not catch it — the exception propagated to the triggerer framework which replaced it with a generic "Trigger failure" message. `execute_complete()` was never called, so operators and `on_failure_callback` lost all error context. **Fix:** Catch `AirflowException` in `run()` and yield `TriggerEvent(status="error", message=str(e))`, routing failures through `execute_complete()` where operators handle non-success events. Also adds `"JobRun.ErrorMessage"` to `GlueJobCompleteTrigger.status_queries` so the actual Glue error text is included alongside `JobRunState`. related: https://github.com/apache/airflow/discussions/63706 ## Tested on EC2 with real Glue job Ran a Glue job designed to fail with `deferrable=True`. Before the fix, `on_failure_callback` received `TaskDeferralError("Trigger failure")`. After: ``` AirflowException: Error in glue job: { 'status': 'error', 'message': 'AWS Glue job failed.: FAILED - RuntimeError: GLUE_TEST: Intentional failure to test error propagation in deferrable mode\nWaiter job_complete failed: Waiter encountered a terminal failure state: For expression "JobRun.JobRunState" we matched expected path: "FAILED"', 'run_id': 'jr_a485d84c34f952e1c8d9d3199455d2d6eda07529034ceb7e1b554514d367f417' } ``` `execute_complete()` was called, error details preserved, `on_failure_callback` received the actual Glue error. ## Operator compatibility audit — feedback requested This change means `execute_complete()` will now be called with `status="error"` events where previously it was never called on failure. I audited all 59 `execute_complete` methods across AWS operators and sensors: - **47 are safe** — use `validated_event["status"] != "success"` with raise - **4 not affected** — their triggers override `run()` with own error handling - **8 have missing error handling** — would silently succeed on error events: | Operator | Issue | |----------|-------| | [DmsDeleteReplicationConfigOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/dms.py#L512) | No status check | | [DmsStartReplicationOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/dms.py#L707) | No status check | | [DmsStopReplicationOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/dms.py#L800) | No status check | | [NeptuneStartDbClusterOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/neptune.py#L189) | Reads status but never validates | | [NeptuneStopDbClusterOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/neptune.py#L316) | Reads status but never validates | | [EmrServerlessStopApplicationOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/emr.py#L1636) | `== "success"` with no else/raise | | [EmrServerlessDeleteApplicationOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/emr.py#L1743) | `== "success"` with no else/raise | | [MwaaTaskSensor](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/sensors/mwaa.py#L283) | `return None` unconditionally | **These are pre-existing bugs** (they never validated the event status), but merging the base trigger fix without addressing them would change their failure mode from "Trigger failure" crash to silent success — which is worse. **Reviewers:** Does the `TriggerEvent(status="error")` approach look right? If so, I'll add commits to fix all 8 operators in this PR before merge. --- ##### Was generative AI tooling used to co-author this PR? - [X] Yes — Claude Code (Claude Opus 4.6) Generated-by: Claude Code (Claude Opus 4.6) following [the guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
