Shlomit-B opened a new pull request, #52769:
URL: https://github.com/apache/airflow/pull/52769

   ### Summary
   Adds full support for AWS Systems Manager (SSM) Run Command in Apache 
Airflow, including:
   
   - SsmRunCommandOperator — sends commands to resources via SSM
   - SsmRunCommandCompletedSensor — waits for command completion using 
list_command_invocations.
     - Checks the status of all target resources for the given command 
invocation.
     -   It waits until all resources have completed the command successfully.
     -   If any resource reports a failure state, the sensor will fail 
immediately, providing faster and clearer feedback on execution issues.
   - SsmRunCommandTrigger — enables deferrable execution
   
   - SsmCommandWaiter — internal waiter for command completion logic
   - Unit tests for each component: operator, sensor, trigger, and waiter
   - System test that exercises the full flow on a live EC2 instance
   
   ### Also included
   - Refactored an EC2 task function into ec2_utils.py for reuse between tests
   - Added iam.py utility file under system tests with create_iam_role helper 
function
   
   ### Sensor design choices and rationale
   When implementing SsmRunCommandCompletedSensor, I considered several 
approaches for handling command status across multiple target resources:
   
   - Checking only one resource’s status was dismissed early on. In most cases, 
users expect the sensor to reflect the overall success of the command. If even 
one instance fails, it's likely the user wouldn’t want downstream tasks to 
continue.
   - (Chosen) Fail immediately if any resource fails: This approach provides 
faster feedback and aligns with the principle of failing fast. It prevents 
downstream tasks from executing when even a single resource didn’t succeed, 
which reflects real-world expectations in multi-instance workflows.
   
   This logic improves reliability and better aligns with common use cases for 
SSM Run Command in production environments.
   
   ### Open question for reviewers
   Would it make sense to make the sensor behavior configurable?
   For example, by adding a parameter like fail_on_any_failure=True to let 
users choose whether to fail immediately or wait for all command invocations to 
complete before deciding.
   
   Currently, the sensor always fails as soon as one resource reports failure.
   This seems like the safer default, but I’m open to making it 
user-configurable if that would better serve broader use cases.
   
   closes: #42619 
   
   ---
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[airflow-core/newsfragments](https://github.com/apache/airflow/tree/main/airflow-core/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to