schattian opened a new issue, #23497:
URL: https://github.com/apache/airflow/issues/23497

   ### Apache Airflow version
   
   2.2.4
   
   ### What happened
   
   I observed that some workers stopped randomly after being running.
   After some investigation, the issue is in the new kubernetes pod operator 
and is dependant of a current issue in the kubernetes api.
   
   When a log rotate event occurs in kubernetes, the stream we consume on 
fetch_container_logs(follow=True,...) is no longer being feeded.
   
   Therefore, the k8s pod operator hangs indefinetly at the middle of the log. 
Only a sigterm could terminate it as logs consumption is blocking execute() to 
finish.
   
   Ref to the issue in kubernetes: 
https://github.com/kubernetes/kubernetes/issues/59902
   
   Linking to https://github.com/apache/airflow/issues/12103 for reference, as 
the result is more or less the same for end user (although the root cause is 
different)
   
   ### What you think should happen instead
   
   Pod should not hang.
   Pod could follow the logs from the new container - this is out of scope of 
airflow, so I don't think so.
   
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow==2.2.4
   apache-airflow-providers-google==6.4.0
   apache-airflow-providers-cncf-kubernetes==3.0.2
   
   However, this should be reproducible with master.
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   I think there are many possibilities to walk-around this from airflow-side 
to not hang indefinitely (like making `fetch_container_logs` non-blocking for 
`execute` and instead always block until status.phase.completed as it's 
currently done when get_logs is not true).
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to