vmtuan12 opened a new issue, #52865:
URL: https://github.com/apache/airflow/issues/52865

   ### Apache Airflow Provider(s)
   
   cncf-kubernetes
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-cncf-kubernetes==10.1.0
   
   ### Apache Airflow version
   
   2.10.5
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   When using 
[KubernetesPodOperator](https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html#kubernetespodoperator),
 sometimes, if the pod runs for too long, an exception like this would occur
   
   ```
   urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(4090 
bytes read, 7603 more expected)', IncompleteRead(4090 bytes read, 7603 more 
expected))
   ```
   
   Some developers asked for this issue at 
https://stackoverflow.com/questions/66135704/airflow-kubernetespodoperator-losing-connection-to-worker-pod,
 and the workaround is to set `get_logs` to `False`, which is not a good 
solution.
   
   However, this issue is due to `kubernetes-client` module, and has been 
reported by multiple developers at 
https://github.com/kubernetes-client/python/issues/972. 
   
   ### What you think should happen instead
   
   The issue remains opened, which means there has not been a fix to tackle the 
root cause. But after testing, I found out that implementing wait & retry would 
fix this, after just 1 attempt.
   
   Add a block of try/catch with errors like 
`urllib3.exceptions.ProtocolError`, `urllib3.exceptions.ConnectionError`, 
`urllib3.exceptions.IncompleteRead` would fix this nicely.
   
   ### How to reproduce
   
   This is hard to reproduce, because it happens occasionally. However, this 
exception is likely to occur when using `KubernetesPodOperator` for a pod 
running over 1 hour.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to