[ 
https://issues.apache.org/jira/browse/AIRFLOW-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844054#comment-16844054
 ] 

ASF GitHub Bot commented on AIRFLOW-4393:
-----------------------------------------

dimberman commented on pull request #5284: [AIRFLOW-4393] Add retry logic when 
fetching pod status and/or logs in KubernetesPodOperator
URL: https://github.com/apache/airflow/pull/5284
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Add retry logic when fetching pod status and/or logs in KubernetesPodOperator
> -----------------------------------------------------------------------------
>
>                 Key: AIRFLOW-4393
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4393
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Olivier Van Goethem
>            Priority: Minor
>
> Over the last weeks we have observed 2 occasions where:
>  * KubernetesPodOperator successfully launches the pod
>  * KubernetesPodOperator then fails when attempting to check the pod 
> status/logs due to: 
> {code:java}
> NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 
> 0x7f1329dc0cf8>: Failed to establish a new connection: [Errno 111] Connection 
> refused',)
> {code}
>  * KubernetesPodOperator launches another pod
>  * We now have 2 pods running in parallel
> The cause for the 'Connection refused' is due to a transient network error. 
> As such, the KubernetesPodOperator should attempt to retry checking the pod 
> status/logs rather than failing the task.
> Airflow is being run through GCP's Cloud Composer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to