fdemiane opened a new issue, #33498:
URL: https://github.com/apache/airflow/issues/33498
### Apache Airflow version
main (development)
### What happened
This is a random issue. The KubernetesPodOperator duplicates logs when they
are interrupted.
### What you think should happen instead
When the logs are interrupted, the KubernetesPodOperator should continue
logging since the last captured timestamp, and not re-print logs that were
already printed.
### How to reproduce
If you run the following dag mulltiple times, you might get duplicated logs.
Note that the issue is random and not easily reproducible:
```
import datetime
from airflow import models
from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import
KubernetesPodOperator
YESTERDAY = datetime.datetime.now() - datetime.timedelta(days=1)
with models.DAG(
dag_id="composer_sample_kubernetes_pod",
schedule_interval=datetime.timedelta(days=1),
start_date=YESTERDAY,
) as dag:
timeout = 240
iterations = 1000
arguments = \
'for i in {1..%(iterations)s}; do echo "$i of %(iterations)s"; done'
% {'iterations': iterations}
kubernetes_min_pod_0 = KubernetesPodOperator(
task_id="pod-ex-minimum-0",
name="pod-ex-minimum-0",
cmds=["/bin/bash", "-c"],
arguments=[arguments],
namespace="default",
image="gcr.io/gcp-runtimes/ubuntu_18_0_4",
startup_timeout_seconds=timeout
)
```
Here is a sample output of duplicated logs:
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 994 of
1000
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 995 of
1000
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 996 of
1000
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 997 of
1000
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 998 of
1000
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 999 of
1000
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 1000 of
1000
[2023-08-18, 13:32:54 UTC] {pod_manager.py:450} ERROR - Error parsing
timestamp (no timestamp in message ''). Will continue execution but won't
update timestamp
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base]
[2023-08-18, 13:32:54 UTC] {kublogduplication.py:116} WARNING - Pod
pod-ex-minimum-0-b3bf545v log read interrupted but container base still running
[2023-08-18, 13:32:55 UTC] {before.py:35} INFO - Starting call to
'unusual_prefix_01add01418c7cdaf7afbb9c57d62a05595a85aa7_kublogduplication.CustomPodManager.fetch_container_logs.<locals>.consume_logs',
this is the 1st time calling it.
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 1 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 2 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 3 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 4 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 5 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 6 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 7 of 1000
[2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 8 of 1000
### Operating System
Ubuntu
### Versions of Apache Airflow Providers
main
### Deployment
Google Cloud Composer
### Deployment details
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]