syedahsn commented on code in PR #30853:
URL: https://github.com/apache/airflow/pull/30853#discussion_r1186331441


##########
airflow/providers/amazon/aws/operators/redshift_cluster.py:
##########
@@ -524,64 +527,52 @@ def __init__(
         aws_conn_id: str = "aws_default",
         deferrable: bool = False,
         poll_interval: int = 10,
+        max_attempts: int = 15,
         **kwargs,
     ):
         super().__init__(**kwargs)
         self.cluster_identifier = cluster_identifier
         self.aws_conn_id = aws_conn_id
         self.deferrable = deferrable
+        self.max_attempts = max_attempts
         self.poll_interval = poll_interval
-        # These parameters are added to address an issue with the boto3 API 
where the API
+        # These parameters are used to address an issue with the boto3 API 
where the API
         # prematurely reports the cluster as available to receive requests. 
This causes the cluster
         # to reject initial attempts to pause the cluster despite reporting 
the correct state.
         self._attempts = 10
         self._attempt_interval = 15
 
     def execute(self, context: Context):
         redshift_hook = RedshiftHook(aws_conn_id=self.aws_conn_id)
+        while self._attempts >= 1:
+            try:
+                
redshift_hook.get_conn().pause_cluster(ClusterIdentifier=self.cluster_identifier)
+                break
+            except 
redshift_hook.get_conn().exceptions.InvalidClusterStateFault as error:
+                self._attempts = self._attempts - 1
 
+                if self._attempts > 0:
+                    self.log.error("Unable to pause cluster. %d attempts 
remaining.", self._attempts)
+                    time.sleep(self._attempt_interval)
+                else:
+                    raise error
         if self.deferrable:
             self.defer(
-                timeout=self.execution_timeout,
-                trigger=RedshiftClusterTrigger(
-                    task_id=self.task_id,
+                trigger=RedshiftPauseClusterTrigger(
+                    cluster_identifier=self.cluster_identifier,
                     poll_interval=self.poll_interval,
+                    max_attempts=self.max_attempts,

Review Comment:
   Using the max attempts vs the timeout was essentially a design choice due to 
how the boto waiter implements things. The boto waiter defines `MaxAttempts` 
and `Delay` so we decided to go with those parameters. Initially, when we had 
our own manual waiter, we were going with a `waiter_countdown` 
[parameter](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/operators/emr.py#L176)
 which would map to the `timeout` on the Trigger. But we are in the process of 
deprecating that parameter. ( As an aside, we calculate `waiter_max_attempts` 
as `waiter_max_attempts = waiter_countdown // waiter_check_interval_seconds`).
   
   >one problem with using max_attempts in trigger is if trigger is killed and 
is respawned it will reset -- not so if we use timeout.
   
   This is a good point. We could solve this by making the `attempt` number as 
part of the kwargs of the trigger - which would save the value if it was killed 
at any point. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to