michalmisiewicz commented on a change in pull request #11325:
URL: https://github.com/apache/airflow/pull/11325#discussion_r501820421
##########
File path: airflow/kubernetes/pod_launcher.py
##########
@@ -124,9 +125,21 @@ def monitor_pod(self, pod: V1Pod, get_logs: bool) ->
Tuple[State, Optional[str]]
:return: Tuple[State, Optional[str]]
"""
if get_logs:
- logs = self.read_pod_logs(pod)
- for line in logs:
- self.log.info(line)
+ read_logs_since_sec = None
+ while True:
+ logs = self.read_pod_logs(pod,
since_seconds=read_logs_since_sec)
+ for line in logs:
+ self.log.info(line)
+ curr_time = dt.now()
+ time.sleep(1)
+
+ if not self.base_container_is_running(pod):
+ break
+
+ self.log.warning('Pod %s log read interrupted',
pod.metadata.name)
+ delta = dt.now() - curr_time
+ # Prefer logs duplication rather than loss
+ read_logs_since_sec = math.ceil(delta.total_seconds())
Review comment:
Good questions. `list_namespaced_pod` has option to prepend timestamp to
log line. Example
```
2020-10-08T14:16:17.793417674Z 172.32.2.36 - - [08/Oct/2020:14:16:17 +0000]
"GET /health HTTP/1.1" 200 187 "-" "kube-probe/1.17
```
We can enable this option, and split log on timestamp and message in
`monitor_pod`. Timestamp is returned in RFC3339Nano format so it should be
easily parsable using `pendulum`. `Pendulum` should handle possible timezone
difference between Kubernetes API and pods.
If you are ok with new design I can refactor the code.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]