baryluk commented on a change in pull request #17649:
URL: https://github.com/apache/airflow/pull/17649#discussion_r690766253



##########
File path: airflow/providers/cncf/kubernetes/utils/pod_launcher.py
##########
@@ -143,12 +143,21 @@ def monitor_pod(self, pod: V1Pod, get_logs: bool) -> 
Tuple[State, V1Pod, Optiona
             read_logs_since_sec = None
             last_log_time = None
             while True:
-                logs = self.read_pod_logs(pod, timestamps=True, 
since_seconds=read_logs_since_sec)
-                for line in logs:
-                    timestamp, message = 
self.parse_log_line(line.decode('utf-8'))
-                    self.log.info(message)
-                    if timestamp:
-                        last_log_time = timestamp
+                try:
+                    logs = self.read_pod_logs(pod, timestamps=True, 
since_seconds=read_logs_since_sec)
+                    for line in logs:
+                        timestamp, message = 
self.parse_log_line(line.decode('utf-8'))
+                        self.log.info(message)
+                        if timestamp:
+                            last_log_time = timestamp
+                except Exception as e:

Review comment:
       I wanted something broad, `Exception` might be a bit too much. But there 
might be more things beyond TimeoutError, i.e. dns issue, authorization issue, 
ssl errors, protocol error, etc. Basically all of this should be ignored, check 
if pod is still alive, and retry later.
   
   `TimeoutError` would definitively help in my use case, but I think it is a 
bit too narrow, and also relays on the fact kube client is using urllib3 
internally, which might not be the case in the future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to