[GitHub] [airflow] jedcunningham commented on a diff in pull request #28336: Fixed hanged KubernetesPodOperator

GitBox Fri, 30 Dec 2022 09:22:53 -0800


jedcunningham commented on code in PR #28336:
URL: https://github.com/apache/airflow/pull/28336#discussion_r1059470652



##########
airflow/providers/cncf/kubernetes/utils/pod_manager.py:
##########
@@ -91,6 +99,61 @@ def get_container_termination_message(pod: V1Pod, 
container_name: str):
         return container_status.state.terminated.message if container_status 
else None
 
 
+class PodLogsConsumer:
+    """
+    PodLogsConsumer is responsible for pulling pod logs from a stream with 
checking a container status before
+    reading data.
+    This class is a workaround for the issue 
https://github.com/apache/airflow/issues/23497
+    """
+
+    def __init__(
+        self,
+        response: HTTPResponse,
+        pod: V1Pod,
+        pod_manager: PodManager,
+        container_name: str,
+        timeout: int = 120,
+    ):
+        self.response = response
+        self.pod = pod
+        self.pod_manager = pod_manager
+        self.container_name = container_name
+        self.timeout = timeout
+
+    def __iter__(self) -> Generator[bytes, None, None]:
+        messages: list[bytes] = []
+        if self.logs_available():
+            for chunk in self.response.stream(amt=None, decode_content=True):
+                if b"\n" in chunk:
+                    chunks = chunk.split(b"\n")
+                    yield b"".join(messages) + chunks[0] + b"\n"
+                    for x in chunks[1:-1]:
+                        yield x + b"\n"
+                    if chunks[-1]:
+                        messages = [chunks[-1]]
+                    else:
+                        messages = []
+                else:
+                    messages.append(chunk)
+                if not self.logs_available():

Review Comment:
   This whole approach seems a little backwards to me, if I'm understanding 
things correctly. _Sometimes_ we can't continue fetching logs after 2 minutes 
(or whatever), so we _always_ stop trying then?
   
   Would setting a timeout on the request itself guard us from getting stuck 
forever? Maybe `_request_timeout` on `read_namespaced_pod_log`? If not, I feel 
like we should find a way to bail once we know we've hit it instead of just not 
attempting it in the first place.



##########
airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py:
##########
@@ -168,6 +168,7 @@ class KubernetesPodOperator(BaseOperator):
     :param labels: labels to apply to the Pod. (templated)
     :param startup_timeout_seconds: timeout in seconds to startup the pod.
     :param get_logs: get the stdout of the container as logs of the tasks.
+    :param logs_timeout: timeout in seconds to read logs after container 
termination.

Review Comment:
   I think a constant is enough for this case. Really don't think it should be 
in the user facing api.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] jedcunningham commented on a diff in pull request #28336: Fixed hanged KubernetesPodOperator

Reply via email to