fdemiane opened a new issue, #33498:
URL: https://github.com/apache/airflow/issues/33498

   ### Apache Airflow version
   
   main (development)
   
   ### What happened
   
   This is a random issue. The KubernetesPodOperator duplicates logs when they 
are interrupted.
   
   ### What you think should happen instead
   
   When the logs are interrupted, the KubernetesPodOperator should continue 
logging since the last captured timestamp, and not re-print logs that were 
already printed.
   
   ### How to reproduce
   
   If you run the following dag mulltiple times, you might get duplicated logs. 
Note that the issue is random and not easily reproducible:
   
   ```
   import datetime
   
   from airflow import models
   from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import 
KubernetesPodOperator
   
   YESTERDAY = datetime.datetime.now() - datetime.timedelta(days=1)
   
   with models.DAG(
       dag_id="composer_sample_kubernetes_pod",
       schedule_interval=datetime.timedelta(days=1),
       start_date=YESTERDAY,
   ) as dag:
       timeout = 240
       iterations = 1000
       arguments = \
           'for i in {1..%(iterations)s}; do echo "$i of %(iterations)s"; done' 
% {'iterations': iterations}
   
       kubernetes_min_pod_0 = KubernetesPodOperator(
           task_id="pod-ex-minimum-0",
           name="pod-ex-minimum-0",
           cmds=["/bin/bash", "-c"],
           arguments=[arguments],
           namespace="default",
           image="gcr.io/gcp-runtimes/ubuntu_18_0_4",
           startup_timeout_seconds=timeout
       )
   ```
   
   Here is a sample output of duplicated logs:
   
   [2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 994 of 
1000
   [2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 995 of 
1000
   [2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 996 of 
1000
   [2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 997 of 
1000
   [2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 998 of 
1000
   [2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 999 of 
1000
   [2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 1000 of 
1000
   [2023-08-18, 13:32:54 UTC] {pod_manager.py:450} ERROR - Error parsing 
timestamp (no timestamp in message ''). Will continue execution but won't 
update timestamp
   [2023-08-18, 13:32:54 UTC] {kublogduplication.py:81} INFO - [base] 
   [2023-08-18, 13:32:54 UTC] {kublogduplication.py:116} WARNING - Pod 
pod-ex-minimum-0-b3bf545v log read interrupted but container base still running
   [2023-08-18, 13:32:55 UTC] {before.py:35} INFO - Starting call to 
'unusual_prefix_01add01418c7cdaf7afbb9c57d62a05595a85aa7_kublogduplication.CustomPodManager.fetch_container_logs.<locals>.consume_logs',
 this is the 1st time calling it.
   [2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 1 of 1000
   [2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 2 of 1000
   [2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 3 of 1000
   [2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 4 of 1000
   [2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 5 of 1000
   [2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 6 of 1000
   [2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 7 of 1000
   [2023-08-18, 13:32:55 UTC] {kublogduplication.py:81} INFO - [base] 8 of 1000
   
   ### Operating System
   
   Ubuntu
   
   ### Versions of Apache Airflow Providers
   
   main
   
   ### Deployment
   
   Google Cloud Composer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to