[ 
https://issues.apache.org/jira/browse/AIRFLOW-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888218#comment-16888218
 ] 

Greg Ferrar edited comment on AIRFLOW-4991 at 7/18/19 6:15 PM:
---------------------------------------------------------------

Making this hack to pod_laucher.py, fixes this (for me and only me, because of 
the hard-coded parameters to `get_kube_client()`):
{code:java}
def read_pod(self, pod):

    print("###GMF Attempting to reload kube client to re-authenticate, by 
calling get_kube_client")
    self._client = get_kube_client(False, None, 
'/home/ubuntu/.kube/kubeconfig_gmf-telemetry')

    try:
        return self._client.read_namespaced_pod(pod.name, pod.namespace)
    except BaseHTTPError as e:
        raise AirflowException(
            'There was an error reading the kubernetes API: {}'.format(e)
        )
{code}
This does not seem like the best solution to me. What I'm doing above is 
recreating the kube client, which indirectly calls `_load_authentication()`. I 
don't see an immediate pretty way to just reload the authentication.

Also, this reloads the client (and reauthenticates to EKS) *every* time 
`read_pod()` is call, which I suspect is extreme overkill, and might even break 
some other non-EKS system.

A better approach might be to call `_load_authentication()` (maybe through a 
new public method `reload_authentication()`, only when there is an exception 
containing "401", and rely on tenacity to rerun `read_pod()` after the 
re-authentication.

Another approach would be to redo authentication every ten minutes regardless, 
maybe using another thread; this should prevent the exception from every 
occurring.

*And of course, the hard-coded parameters to `get_kube_client()` won't do at 
all.* So any real solution will need to *determine* the config file location, 
cluster info, etc., and pass proper parameters to `get_kube_client()` (or 
`_load_authentication()`). The hack above obvious works only for my specific 
environment.


was (Author: greg-ferrar):
Making this hack to pod_laucher.py, fixes this:
{code:java}
def read_pod(self, pod):

    print("###GMF Attempting to reload kube client to re-authenticate, by 
calling get_kube_client")
    self._client = get_kube_client(False, None, 
'/home/ubuntu/.kube/kubeconfig_gmf-telemetry')

    try:
        return self._client.read_namespaced_pod(pod.name, pod.namespace)
    except BaseHTTPError as e:
        raise AirflowException(
            'There was an error reading the kubernetes API: {}'.format(e)
        )
{code}
This does not seem like the best solution to me. What I'm doing above is 
recreating the kube client, which indirectly calls `_load_authentication()`. I 
don't see an immediate pretty way to just reload the authentication.

Also, this reloads the client (and reauthenticates to EKS) *every* time 
`read_pod()` is call, which I suspect is extreme overkill, and might even break 
some other non-EKS system.

A better approach might be to call `_load_authentication()` (maybe through a 
new public method `reload_authentication()`, only when there is an exception 
containing "401", and rely on tenacity to rerun `read_pod()` after the 
re-authentication.

Another approach would be to redo authentication every ten minutes regardless, 
maybe using another thread; this should prevent the exception from every 
occurring.

*And of course, the hard-coded parameters to `get_kube_client()` won't do at 
all.* So any real solution will need to *determine* the config file location, 
cluster info, etc., and pass proper parameters to `get_kube_client()` (or 
`_load_authentication()`). The hack above obvious works only for my specific 
environment.

> 401 Unauthorized in read_pod() when EKS Kubernetes TaskInstance runs longer 
> than 15 minutes
> -------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-4991
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4991
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: authentication
>    Affects Versions: 1.10.3
>            Reporter: Greg Ferrar
>            Priority: Minor
>
> Using KubernetesOperator with EKS, tasks that run more than 15 minutes result 
> in a `401 Unauthenticated` error after the worker script successfully 
> completes.
> {code:java}
> [2019-07-10 20:46:32,509] {__init__.py:1580} ERROR - (401)
> Reason: Unauthorized
> HTTP response headers: HTTPHeaderDict({'Audit-Id': 
> '0587c262-eec3-4d20-a2ce-925e1bdd51eb', 'Content-Type': 'application/json', 
> 'Date': 'Wed, 10 Jul 2019 20:46:32 GMT', 'Content-Length': '129'})
> {code}
> This is due to EKS having a 15-minute timeout on its authentication token.
> Solution is to re-authenticate with EKS at least every fifteen minutes, or 
> maybe just at the end of the job.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to