syedahsn commented on code in PR #30853:
URL: https://github.com/apache/airflow/pull/30853#discussion_r1186331441
##########
airflow/providers/amazon/aws/operators/redshift_cluster.py:
##########
@@ -524,64 +527,52 @@ def __init__(
aws_conn_id: str = "aws_default",
deferrable: bool = False,
poll_interval: int = 10,
+ max_attempts: int = 15,
**kwargs,
):
super().__init__(**kwargs)
self.cluster_identifier = cluster_identifier
self.aws_conn_id = aws_conn_id
self.deferrable = deferrable
+ self.max_attempts = max_attempts
self.poll_interval = poll_interval
- # These parameters are added to address an issue with the boto3 API
where the API
+ # These parameters are used to address an issue with the boto3 API
where the API
# prematurely reports the cluster as available to receive requests.
This causes the cluster
# to reject initial attempts to pause the cluster despite reporting
the correct state.
self._attempts = 10
self._attempt_interval = 15
def execute(self, context: Context):
redshift_hook = RedshiftHook(aws_conn_id=self.aws_conn_id)
+ while self._attempts >= 1:
+ try:
+
redshift_hook.get_conn().pause_cluster(ClusterIdentifier=self.cluster_identifier)
+ break
+ except
redshift_hook.get_conn().exceptions.InvalidClusterStateFault as error:
+ self._attempts = self._attempts - 1
+ if self._attempts > 0:
+ self.log.error("Unable to pause cluster. %d attempts
remaining.", self._attempts)
+ time.sleep(self._attempt_interval)
+ else:
+ raise error
if self.deferrable:
self.defer(
- timeout=self.execution_timeout,
- trigger=RedshiftClusterTrigger(
- task_id=self.task_id,
+ trigger=RedshiftPauseClusterTrigger(
+ cluster_identifier=self.cluster_identifier,
poll_interval=self.poll_interval,
+ max_attempts=self.max_attempts,
Review Comment:
Using the max attempts vs the timeout was essentially a design choice due to
how the boto waiter implements things. The boto waiter defines `MaxAttempts`
and `Delay` so we decided to go with those parameters. Initially, when we had
our own manual waiter, we were going with a `waiter_countdown`
[parameter](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/operators/emr.py#L176)
which would map to the `timeout` on the Trigger. But we are in the process of
deprecating that parameter. ( As an aside, we calculate `waiter_max_attempts`
as `waiter_max_attempts = waiter_countdown // waiter_check_interval_seconds`).
>one problem with using max_attempts in trigger is if trigger is killed and
is respawned it will reset -- not so if we use timeout.
This is a good point. We could solve this by making the `attempt` number as
part of the kwargs of the trigger - which would save the value if it was killed
at any point. What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]