[ 
https://issues.apache.org/jira/browse/AIRFLOW-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888218#comment-16888218
 ] 

Greg Ferrar commented on AIRFLOW-4991:
--------------------------------------

Making this hack to pod_laucher.py, fixes this:
{code:java}
def read_pod(self, pod):
    print("###GMF in read_pod()")

    #GMF Debug
    print("###GMF Attempting to reload kube client to re-authenticate, by 
calling get_kube_client")
    self._client = get_kube_client(False, None, 
'/home/ubuntu/.kube/kubeconfig_gmf-telemetry')
    print("###GMF DONE calling get_kube_client()")

    try:
        return self._client.read_namespaced_pod(pod.name, pod.namespace)
    except BaseHTTPError as e:
    print("###GMF Exception in read_pod()")

    raise AirflowException(
        'There was an error reading the kubernetes API: {}'.format(e)
    )
{code}
This does not seem like the best solution to me. What I'm doing above is 
recreating the kube client, which indirectly calls `_load_authentication()`. I 
don't see an immediate pretty way to just reload the authentication.

Also, this reloads the client (and reauthenticates to EKS) *every* time 
`read_pod()` is call, which I suspect is extreme overkill, and might even break 
some other non-EKS system.

A better approach might be to call `_load_authentication()` (maybe through a 
new public method `reload_authentication()`, only when there is an exception 
containing "401", and rely on tenacity to rerun `read_pod()` after the 
re-authentication.

> 401 error when EKS task runs longer than 15 minutes
> ---------------------------------------------------
>
>                 Key: AIRFLOW-4991
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4991
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: authentication
>    Affects Versions: 1.10.3
>            Reporter: Greg Ferrar
>            Priority: Minor
>
> Using KubernetesOperator with EKS, tasks that run more than 15 minutes result 
> in a `401 Unauthenticated` error after the worker script successfully 
> completes.
> This is due to EKS having a 15-minute timeout on its authentication token.
> Solution is to re-authenticate with EKS at least every fifteen minutes, or 
> maybe just at the end of the job.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to