karenbraganz opened a new issue, #61180:
URL: https://github.com/apache/airflow/issues/61180

   ### Description
   
   When `wait_for_completion=True` in EmrCreateJobFlowOperator, the operator 
code does not actually wait for the cluster to complete successfully before 
returning a success state. Instead, a success state is returned as soon as the 
cluster starts running. This can result in the task succeeding even if the 
cluster is terminated with errors after it begins running. 
   
   I believe this is due to [this line of 
code](https://github.com/apache/airflow/blob/1b3329eb670a1fbf70d2f7eeaa21aaf7baa7bacd/providers/amazon/src/airflow/providers/amazon/aws/operators/emr.py#L761)
 that assigns the "WAIT_FOR_COMPLETION" WaitPolicy to the waiter. This 
corresponds to the ["job_for_waiting" wait 
policy](https://github.com/apache/airflow/blob/1b3329eb670a1fbf70d2f7eeaa21aaf7baa7bacd/providers/amazon/src/airflow/providers/amazon/aws/utils/waiter.py#L103)
 with which the waiter will only wait for the cluster to start running before 
returning a success state. 
   
   If the user wants the waiter to wait until the cluster completes, 
[WAIT_FOR_STEPS_COMPLETION corresponding to the 
"job_flow_terminated"](https://github.com/apache/airflow/blob/1b3329eb670a1fbf70d2f7eeaa21aaf7baa7bacd/providers/amazon/src/airflow/providers/amazon/aws/utils/waiter.py#L104)
 wait policy must be used. The operator has hard coded the "job_for_waiting" 
wait policy, so the user cannot configure the wait policy. 
   
   I propose adding a wait_policy parameter to the operator which allows the 
user to specify which wait policy they would prefer to use.
   
   ### Use case/motivation
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to