danielrolfes2307 opened a new issue, #43912:
URL: https://github.com/apache/airflow/issues/43912
### Apache Airflow Provider(s)
cncf-kubernetes
### Versions of Apache Airflow Providers
_No response_
### Apache Airflow version
v2.10.2
### Operating System
Debian GNU/Linux 12 (bookworm)
### Deployment
Other 3rd-party Helm chart
### Deployment details
Deployment of Airflow via "Airflow Helm Chart (User Community)" to
Kubernetes Cluster (EKS).
Triggering Pods via KubernetesPodOperator.
### What happened
Sometimes tasks/pods are failing directly after starting in
`pod_manager: read_pod_logs`
with:
```
kubernetes.client.exceptions.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Audit-Id':
'70fd7044-c526-4061-9e80-ced705d0ccdc', 'Cache-Control': 'no-cache, private',
'Content-Type': 'application/json', 'Date': 'Mon, 11 Nov 2024 04:33:45 GMT',
'Content-Length': '249'})
HTTP response body:
b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get
\\"https://172.17.134.89:10250/containerLogs/monitoring/kowalski-auto-ua72q9pi/base?follow=true\\u0026timestamps=true\\":
remote error: tls: internal error","code":500}\n'
```
Snippet of Log:
```
[2024-11-11T04:33:45.253+0000] {taskinstance.py:3311} ERROR - Task failed
with exception
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
line 767, in _execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
line 733, in _execute_callable
return ExecutionCallableRunner(
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/operator_helpers.py",
line 252, in run
return self.func(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py",
line 417, in wrapper
return func(self, *args, **kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 594, in execute
return self.execute_sync(context)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 627, in execute_sync
self.await_pod_completion(pod=self.pod)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
336, in wrapped_f
return copy(f, *args, **kw)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
475, in __call__
do = self.iter(retry_state=retry_state)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
376, in iter
result = action(retry_state)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
398, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in
result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in
__get_result
raise self._exception
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
478, in __call__
result = fn(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 678, in await_pod_completion
raise exc
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 663, in await_pod_completion
self.pod_manager.fetch_requested_container_logs(
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 587, in fetch_requested_container_logs
status = self.fetch_container_logs(pod=pod, container_name=c,
follow=follow_logs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 510, in fetch_container_logs
last_log_time, exc = consume_logs(since_time=last_log_time)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 440, in consume_logs
logs = self.read_pod_logs(
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
336, in wrapped_f
return copy(f, *args, **kw)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
475, in __call__
do = self.iter(retry_state=retry_state)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
376, in iter
result = action(retry_state)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
418, in exc_check
raise retry_exc.reraise()
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
185, in reraise
raise self.last_attempt.result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in
result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in
__get_result
raise self._exception
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
478, in __call__
result = fn(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 675, in read_pod_logs
logs = self._client.read_namespaced_pod_log(
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py",
line 23957, in read_namespaced_pod_log
return self.read_namespaced_pod_log_with_http_info(name, namespace,
**kwargs) # noqa: E501
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py",
line 24076, in read_namespaced_pod_log_with_http_info
return self.api_client.call_api(
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py",
line 348, in call_api
return self.__call_api(resource_path, method,
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py",
line 180, in __call_api
response_data = self.request(
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py",
line 373, in request
return self.rest_client.GET(url,
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/rest.py",
line 244, in GET
return self.request("GET", url,
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/rest.py",
line 238, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Audit-Id':
'70fd7044-c526-4061-9e80-ced705d0ccdc', 'Cache-Control': 'no-cache, private',
'Content-Type': 'application/json', 'Date': 'Mon, 11 Nov 2024 04:33:45 GMT',
'Content-Length': '249'})
HTTP response body:
b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get
\\"https://172.17.134.89:10250/containerLogs/monitoring/kowalski-auto-ua72q9pi/base?follow=true\\u0026timestamps=true\\":
remote error: tls: internal error","code":500}\n'
[2024-11-11T04:33:45.275+0000] {taskinstance.py:906} DEBUG - Task Duration
set to 78.504676
[2024-11-11T04:33:45.276+0000] {taskinstance.py:928} DEBUG - Clearing
next_method and next_kwargs.
[2024-11-11T04:33:45.277+0000] {taskinstance.py:1225} INFO - Marking task as
FAILED. dag_id=kowalski-auto, task_id=kowalski-auto,
run_id=kowalski-a2a-multilappen-dkha-d81ea11e-473c-418b-adf1-ac0d3295737e,
execution_date=20241111T043036, start_date=20241111T043226,
end_date=20241111T043345
[2024-11-11T04:33:45.387+0000] {taskinstance.py:340} INFO - ::group::Post
task execution logs
[2024-11-11T04:33:45.388+0000] {cli_action_loggers.py:98} DEBUG - Calling
callbacks: []
[2024-11-11T04:33:45.389+0000] {standard_task_runner.py:124} ERROR - Failed
to execute job 125683 for task kowalski-auto ((500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Audit-Id':
'70fd7044-c526-4061-9e80-ced705d0ccdc', 'Cache-Control': 'no-cache, private',
'Content-Type': 'application/json', 'Date': 'Mon, 11 Nov 2024 04:33:45 GMT',
'Content-Length': '249'})
HTTP response body:
b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get
\\"https://172.17.134.89:10250/containerLogs/monitoring/kowalski-auto-ua72q9pi/base?follow=true\\u0026timestamps=true\\":
remote error: tls: internal error","code":500}\n'
; 132)
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/task/task_runner/standard_task_runner.py",
line 117, in _start_by_fork
ret = args.func(args, dag=self.dag)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/cli_config.py",
line 49, in command
return func(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/cli.py", line
116, in wrapper
return f(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py",
line 483, in task_run
task_return_code = _run_task_by_selected_method(args, _dag, ti)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py",
line 256, in _run_task_by_selected_method
return _run_raw_task(args, ti)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py",
line 341, in _run_raw_task
return ti._run_raw_task(
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/session.py",
line 97, in wrapper
return func(*args, session=session, **kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
line 3005, in _run_raw_task
return _run_raw_task(
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
line 273, in _run_raw_task
TaskInstance._execute_task_with_callbacks(
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
line 3159, in _execute_task_with_callbacks
result = self._execute_task(context, task_orig)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
line 3183, in _execute_task
return _execute_task(self, context, task_orig)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
line 767, in _execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
line 733, in _execute_callable
return ExecutionCallableRunner(
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/operator_helpers.py",
line 252, in run
return self.func(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py",
line 417, in wrapper
return func(self, *args, **kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 594, in execute
return self.execute_sync(context)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 627, in execute_sync
self.await_pod_completion(pod=self.pod)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
336, in wrapped_f
return copy(f, *args, **kw)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
475, in __call__
do = self.iter(retry_state=retry_state)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
376, in iter
result = action(retry_state)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
398, in <lambda>
self._add_action_func(lambda rs: rs.outcome.result())
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in
result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in
__get_result
raise self._exception
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
478, in __call__
result = fn(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 678, in await_pod_completion
raise exc
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 663, in await_pod_completion
self.pod_manager.fetch_requested_container_logs(
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 587, in fetch_requested_container_logs
status = self.fetch_container_logs(pod=pod, container_name=c,
follow=follow_logs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 510, in fetch_container_logs
last_log_time, exc = consume_logs(since_time=last_log_time)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 440, in consume_logs
logs = self.read_pod_logs(
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
336, in wrapped_f
return copy(f, *args, **kw)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
475, in __call__
do = self.iter(retry_state=retry_state)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
376, in iter
result = action(retry_state)
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
418, in exc_check
raise retry_exc.reraise()
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
185, in reraise
raise self.last_attempt.result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in
result
return self.__get_result()
File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in
__get_result
raise self._exception
File
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line
478, in __call__
result = fn(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 675, in read_pod_logs
logs = self._client.read_namespaced_pod_log(
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py",
line 23957, in read_namespaced_pod_log
return self.read_namespaced_pod_log_with_http_info(name, namespace,
**kwargs) # noqa: E501
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py",
line 24076, in read_namespaced_pod_log_with_http_info
return self.api_client.call_api(
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py",
line 348, in call_api
return self.__call_api(resource_path, method,
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py",
line 180, in __call_api
response_data = self.request(
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py",
line 373, in request
return self.rest_client.GET(url,
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/rest.py",
line 244, in GET
return self.request("GET", url,
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/rest.py",
line 238, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Audit-Id':
'70fd7044-c526-4061-9e80-ced705d0ccdc', 'Cache-Control': 'no-cache, private',
'Content-Type': 'application/json', 'Date': 'Mon, 11 Nov 2024 04:33:45 GMT',
'Content-Length': '249'})
HTTP response body:
b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get
\\"https://172.17.134.89:10250/containerLogs/monitoring/kowalski-auto-ua72q9pi/base?follow=true\\u0026timestamps=true\\":
remote error: tls: internal error","code":500}\n'
[2024-11-11T04:33:45.427+0000] {local_task_job_runner.py:266} INFO - Task
exited with return code 1
[2024-11-11T04:33:45.447+0000] {taskinstance.py:3859} DEBUG - Skip locked
rows, rollback
[2024-11-11T04:33:45.454+0000] {local_task_job_runner.py:245} INFO -
::endgroup::
```
### What you think should happen instead
The exception mentioned in "What happened" states: "tls: internal error"
The assumption is that this is a case of the CSR not being approved (yet)
The comment in `podmanager:` fetch_container_logs states:
"
Between when the pod starts and logs being available, there might be
a delay due to CSR not approved
and signed yet. In such situation, ApiException is thrown. This is
why we are **retrying** on this
specific exception.
"
This leads me to believe that in cases of failures in trying to consume logs
from a pod because of CSR not being approved, will lead to retries.
Unfortunately I do not see any retries being made in this case (see provided
log)
Is my assumption correct or am I missing something?
Thanks :)
### How to reproduce
Start a pod via KubernetesPodOperator and start consuming logs before CSR is
approved.
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]