o-nikolas commented on code in PR #27276:
URL: https://github.com/apache/airflow/pull/27276#discussion_r1011087776


##########
airflow/providers/amazon/aws/operators/redshift_cluster.py:
##########
@@ -392,20 +393,31 @@ def __init__(
         *,
         cluster_identifier: str,
         aws_conn_id: str = "aws_default",
+        attempts: int = 1,
+        attempt_interval: int = 30,
         **kwargs,
     ):
         super().__init__(**kwargs)
         self.cluster_identifier = cluster_identifier
         self.aws_conn_id = aws_conn_id
+        self.attempts = attempts
+        self.attempt_interval = attempt_interval
 
     def execute(self, context: Context):
         redshift_hook = RedshiftHook(aws_conn_id=self.aws_conn_id)
-        cluster_state = 
redshift_hook.cluster_status(cluster_identifier=self.cluster_identifier)
-        if cluster_state == "paused":
-            self.log.info("Starting Redshift cluster %s", 
self.cluster_identifier)
-            
redshift_hook.get_conn().resume_cluster(ClusterIdentifier=self.cluster_identifier)
-        else:
-            raise Exception(f"Unable to resume cluster - cluster state is 
{cluster_state}")
+
+        while self.attempts >= 1:
+            try:
+                
redshift_hook.get_conn().resume_cluster(ClusterIdentifier=self.cluster_identifier)
+                return
+            except 
redshift_hook.get_conn().exceptions.InvalidClusterStateFault as error:
+                self.attempts = self.attempts - 1
+
+                if self.attempts > 0:
+                    self.log.error("Unable to resume cluster. %d attempts 
remaining.", self.attempts)
+                    time.sleep(self.attempt_interval)
+                else:
+                    raise error

Review Comment:
   IMHO the top level `retries` and `retry_delay` are for users to generally 
dictate what an operator should do when it unexpectedly fails. The code Syed is 
adding here is to actually account for a known poor behaviour on the AWS 
redshift API/SDK where it unreliably reports a good state but still rejects 
mutations to the cluster for a short time.
   
   Though I do agree with @eladkal's second point, we don't need to expose the 
parameters to the users. @syedahsn we can drop the arguments you added to the 
method stub and just define them locally here with reasonable defaults. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to