Re: [PR] Modify KPO to log container log periodically [airflow]

via GitHub Mon, 12 Feb 2024 03:17:38 -0800


pankajastro commented on code in PR #37279:
URL: https://github.com/apache/airflow/pull/37279#discussion_r1486042792



##########
airflow/providers/cncf/kubernetes/operators/pod.py:
##########
@@ -659,10 +672,79 @@ def invoke_defer_method(self):
                 startup_check_interval=self.startup_check_interval_seconds,
                 base_container_name=self.base_container_name,
                 on_finish_action=self.on_finish_action.value,
+                last_log_time=last_log_time,
+                logging_interval=self.logging_interval,
             ),
-            method_name="execute_complete",
+            method_name="trigger_reentry",
         )
 
+    @staticmethod
+    def raise_for_trigger_status(event: dict[str, Any]) -> None:
+        """Raise exception if pod is not in expected state."""
+        if event["status"] == "error":
+            error_type = event["error_type"]
+            description = event["description"]
+            if error_type == "PodLaunchTimeoutException":
+                raise PodLaunchTimeoutException(description)
+            else:
+                raise AirflowException(description)
+
+    def trigger_reentry(self, context: Context, event: dict[str, Any]) -> Any:
+        """
+        Point of re-entry from trigger.
+
+        If ``logging_interval`` is None, then at this point the pod should be 
done and we'll just fetch
+        the logs and exit.
+
+        If ``logging_interval`` is not None, it could be that the pod is still 
running and we'll just
+        grab the latest logs and defer back to the trigger again.
+        """
+        remote_pod = None
+        try:
+            self.pod_request_obj = self.build_pod_request_obj(context)
+            self.pod = self.find_pod(
+                namespace=self.namespace or 
self.pod_request_obj.metadata.namespace,
+                context=context,
+            )
+
+            # we try to find pod before possibly raising so that on_kill will 
have `pod` attr
+            self.raise_for_trigger_status(event)
+
+            if not self.pod:
+                raise PodNotFoundException("Could not find pod after resuming 
from deferral")
+
+            if self.get_logs:
+                last_log_time = event and event.get("last_log_time")
+                if last_log_time:
+                    self.log.info("Resuming logs read from time %r", 
last_log_time)
+                pod_log_status = self.pod_manager.fetch_container_logs(
+                    pod=self.pod,
+                    container_name=self.BASE_CONTAINER_NAME,
+                    follow=self.logging_interval is None,
+                    since_time=last_log_time,
+                )
+                if pod_log_status.running:
+                    self.log.info("Container still running; deferring again.")
+                    self.invoke_defer_method(pod_log_status.last_log_time)
+
+            if self.do_xcom_push:
+                result = self.extract_xcom(pod=self.pod)
+            remote_pod = self.pod_manager.await_pod_completion(self.pod)
+        except TaskDeferred:
+            raise
+        except Exception:
+            self.cleanup(
+                pod=self.pod or self.pod_request_obj,
+                remote_pod=remote_pod,
+            )
+            raise
+        self.cleanup(
+            pod=self.pod or self.pod_request_obj,
+            remote_pod=remote_pod,
+        )
+        if self.do_xcom_push:
+            return result
+
     def execute_complete(self, context: Context, event: dict, **kwargs):

Review Comment:
   Thanks for the feedback. Yes, this method is public so we can't remove it. 
similarly, a couple of methods in the trigger are unused now I kept them as 
well because they might be used by someone. 
   
   We have conducted some testing on our end. Once RC is available, we will 
perform additional tests. Hopefully, some community members will also test RC 
so we should be able find and fix if something unusual happend before releasing 
it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Modify KPO to log container log periodically [airflow]

Reply via email to