[
https://issues.apache.org/jira/browse/AIRFLOW-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888218#comment-16888218
]
Greg Ferrar edited comment on AIRFLOW-4991 at 7/18/19 6:15 PM:
---------------------------------------------------------------
Making this hack to pod_laucher.py, fixes this (for me and only me, because of
the hard-coded parameters to `get_kube_client()`):
{code:java}
def read_pod(self, pod):
print("###GMF Attempting to reload kube client to re-authenticate, by
calling get_kube_client")
self._client = get_kube_client(False, None,
'/home/ubuntu/.kube/kubeconfig_gmf-telemetry')
try:
return self._client.read_namespaced_pod(pod.name, pod.namespace)
except BaseHTTPError as e:
raise AirflowException(
'There was an error reading the kubernetes API: {}'.format(e)
)
{code}
This does not seem like the best solution to me. What I'm doing above is
recreating the kube client, which indirectly calls `_load_authentication()`. I
don't see an immediate pretty way to just reload the authentication.
Also, this reloads the client (and reauthenticates to EKS) *every* time
`read_pod()` is call, which I suspect is extreme overkill, and might even break
some other non-EKS system.
A better approach might be to call `_load_authentication()` (maybe through a
new public method `reload_authentication()`, only when there is an exception
containing "401", and rely on tenacity to rerun `read_pod()` after the
re-authentication.
Another approach would be to redo authentication every ten minutes regardless,
maybe using another thread; this should prevent the exception from every
occurring.
*And of course, the hard-coded parameters to `get_kube_client()` won't do at
all.* So any real solution will need to *determine* the config file location,
cluster info, etc., and pass proper parameters to `get_kube_client()` (or
`_load_authentication()`). The hack above obvious works only for my specific
environment.
was (Author: greg-ferrar):
Making this hack to pod_laucher.py, fixes this:
{code:java}
def read_pod(self, pod):
print("###GMF Attempting to reload kube client to re-authenticate, by
calling get_kube_client")
self._client = get_kube_client(False, None,
'/home/ubuntu/.kube/kubeconfig_gmf-telemetry')
try:
return self._client.read_namespaced_pod(pod.name, pod.namespace)
except BaseHTTPError as e:
raise AirflowException(
'There was an error reading the kubernetes API: {}'.format(e)
)
{code}
This does not seem like the best solution to me. What I'm doing above is
recreating the kube client, which indirectly calls `_load_authentication()`. I
don't see an immediate pretty way to just reload the authentication.
Also, this reloads the client (and reauthenticates to EKS) *every* time
`read_pod()` is call, which I suspect is extreme overkill, and might even break
some other non-EKS system.
A better approach might be to call `_load_authentication()` (maybe through a
new public method `reload_authentication()`, only when there is an exception
containing "401", and rely on tenacity to rerun `read_pod()` after the
re-authentication.
Another approach would be to redo authentication every ten minutes regardless,
maybe using another thread; this should prevent the exception from every
occurring.
*And of course, the hard-coded parameters to `get_kube_client()` won't do at
all.* So any real solution will need to *determine* the config file location,
cluster info, etc., and pass proper parameters to `get_kube_client()` (or
`_load_authentication()`). The hack above obvious works only for my specific
environment.
> 401 Unauthorized in read_pod() when EKS Kubernetes TaskInstance runs longer
> than 15 minutes
> -------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-4991
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4991
> Project: Apache Airflow
> Issue Type: Bug
> Components: authentication
> Affects Versions: 1.10.3
> Reporter: Greg Ferrar
> Priority: Minor
>
> Using KubernetesOperator with EKS, tasks that run more than 15 minutes result
> in a `401 Unauthenticated` error after the worker script successfully
> completes.
> {code:java}
> [2019-07-10 20:46:32,509] {__init__.py:1580} ERROR - (401)
> Reason: Unauthorized
> HTTP response headers: HTTPHeaderDict({'Audit-Id':
> '0587c262-eec3-4d20-a2ce-925e1bdd51eb', 'Content-Type': 'application/json',
> 'Date': 'Wed, 10 Jul 2019 20:46:32 GMT', 'Content-Length': '129'})
> {code}
> This is due to EKS having a 15-minute timeout on its authentication token.
> Solution is to re-authenticate with EKS at least every fifteen minutes, or
> maybe just at the end of the job.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)