baolsen commented on a change in pull request #11883:
URL: https://github.com/apache/airflow/pull/11883#discussion_r517896190
##########
File path: airflow/providers/amazon/aws/sensors/emr_step.py
##########
@@ -21,24 +21,34 @@
from airflow.providers.amazon.aws.sensors.emr_base import EmrBaseSensor
from airflow.utils.decorators import apply_defaults
+from airflow.exceptions import AirflowException
+from airflow.models.xcom import XCOM_RETURN_KEY
+
class EmrStepSensor(EmrBaseSensor):
"""
Asks for the state of the step until it reaches any of the target states.
+ If no step Id has been provided the sensor retrieves all steps and current
states
+ and will continue to ask until all steps reaches any of the target states.
Review comment:
Hi @feluelle
Thanks for feedback.
We've tested with EmrJobFlowSensor but it isn't behaving as we'd like. The
cluster moves in to TERMINATED state even if some steps fail, I guess because
we are setting the steps with CONTINUE status. In our use case the steps are
independent and we therefore want the others to run even if some do fail.
I would have expected the cluster to move into TERMINATED_WITH_ERRORS but
this isn't clearly documented by AWS at all. We've logged a support case to get
clarity on it. Maybe it is a bug, but unlikely.... They will probably just say
they will update the documentation.
Once we have feedback we can decide how best to handle checking for the
steps status.
My thoughts:
- I think a separate operator for this would be overkill. JobFowSensor
should work for what we want to do, if we can get it to be sensitive to the
step status as well.
- We could add to the EmrJobFlowSensor to optionally also check the status
of the steps before succeeding, and enable that check by default. I think it
would be unexpected behavior that the operator is successful even if some steps
weren't successful.
- We can XCom push the step status into their own XCom key before completing
the execute, which also enables our use case (with our use case we need to do
something with/about the failed steps so we need to know which ones they are).
What do you think?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]