TobKed commented on a change in pull request #8550:
URL: https://github.com/apache/airflow/pull/8550#discussion_r501785531



##########
File path: airflow/providers/google/cloud/hooks/dataflow.py
##########
@@ -282,41 +295,70 @@ def wait_for_done(self) -> None:
             time.sleep(self._poll_sleep)
             self._refresh_jobs()
 
-    def get_jobs(self) -> List[Dict]:
+    def get_jobs(self, refresh=False) -> List[Dict]:
         """
         Returns Dataflow jobs.
 
         :return: list of jobs
         :rtype: list
         """
-        if not self._jobs:
+        if not self._jobs or refresh:
             self._refresh_jobs()
         if not self._jobs:
             raise ValueError("Could not read _jobs")
 
         return self._jobs
 
+    def _wait_for_states(self, expected_states: Set[str]):
+        """
+        Waiting for the jobs to reach a certain state.
+        """
+        if not self._jobs:
+            raise ValueError("The _jobs should be set")
+        while True:

Review comment:
       PTAL: 
https://github.com/apache/airflow/pull/8550/commits/144b63f4ec9835d8c6c57816ab04761b83bee6c2
   
   Since the `cancel` method is executed only by the operator (`on_kill`) I 
didn't allow user to configure this timeout because I don't think it is worth 
to add complexity to the user and the code itself. It is added to prevent 
hanging this `_wait_for_states` forever when `execution_timeout` is not set and 
provide more meaningful log message if the timeout eventually occur.
   
   I proposed 5 minutes of default timeout.  @aaltay @kamilwu what do you think 
about this value ?

##########
File path: airflow/providers/google/cloud/hooks/dataflow.py
##########
@@ -282,41 +295,70 @@ def wait_for_done(self) -> None:
             time.sleep(self._poll_sleep)
             self._refresh_jobs()
 
-    def get_jobs(self) -> List[Dict]:
+    def get_jobs(self, refresh=False) -> List[Dict]:
         """
         Returns Dataflow jobs.
 
         :return: list of jobs
         :rtype: list
         """
-        if not self._jobs:
+        if not self._jobs or refresh:
             self._refresh_jobs()
         if not self._jobs:
             raise ValueError("Could not read _jobs")
 
         return self._jobs
 
+    def _wait_for_states(self, expected_states: Set[str]):
+        """
+        Waiting for the jobs to reach a certain state.
+        """
+        if not self._jobs:
+            raise ValueError("The _jobs should be set")
+        while True:

Review comment:
       @mik-laj what do you think about removing waiting? I lean towards 
@turbaszek opinion

##########
File path: airflow/providers/google/cloud/hooks/dataflow.py
##########
@@ -282,41 +295,70 @@ def wait_for_done(self) -> None:
             time.sleep(self._poll_sleep)
             self._refresh_jobs()
 
-    def get_jobs(self) -> List[Dict]:
+    def get_jobs(self, refresh=False) -> List[Dict]:
         """
         Returns Dataflow jobs.
 
         :return: list of jobs
         :rtype: list
         """
-        if not self._jobs:
+        if not self._jobs or refresh:
             self._refresh_jobs()
         if not self._jobs:
             raise ValueError("Could not read _jobs")
 
         return self._jobs
 
+    def _wait_for_states(self, expected_states: Set[str]):
+        """
+        Waiting for the jobs to reach a certain state.
+        """
+        if not self._jobs:
+            raise ValueError("The _jobs should be set")
+        while True:

Review comment:
       @aaltay I created separated draft PR for handling draining on kill: 
https://github.com/apache/airflow/pull/11374. WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to