fdemiane commented on issue #39236: URL: https://github.com/apache/airflow/issues/39236#issuecomment-2132387258
If we actually look at the logs, the logs that have been duplicated are within one second. If we look at the code [here](https://github.com/apache/airflow/blob/providers-cncf-kubernetes/7.13.0/airflow/providers/cncf/kubernetes/utils/pod_manager.py#L424), we see that read_pod_logs take since_seconds which is in seconds, and is passed to [_client.read_namespaced_pod_logs](https://github.com/apache/airflow/blob/providers-cncf-kubernetes/7.13.0/airflow/providers/cncf/kubernetes/utils/pod_manager.py#L645) (docs [here](https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CoreV1Api.md#read_namespaced_pod_log)) which does not support a finer grained time representation. Also looking at the [Kubernetes API reference](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.30/), it doesn't seem to support passing a finer-grained time representation. kubctl seem to support passing a since_time which allows passing a timestamp which supports milliseconds as seen [here](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_logs/#options). Doing a little search, I found this issue [here](https://github.com/kubernetes-client/python/issues/1351) in the distant past. The **optimal** fix for this issue to to provide a way to support passing a since_time in the kubernetes client (out of scope of Airflow), then do the necessary code changes in the KPO. A **quick win** would be to add a warning message that logs within one second might get duplicated (maybe [here](airflow/providers/cncf/kubernetes/utils/pod_manager.py)?). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
