danielrolfes2307 opened a new issue, #43912:
URL: https://github.com/apache/airflow/issues/43912

   ### Apache Airflow Provider(s)
   
   cncf-kubernetes
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Apache Airflow version
   
   v2.10.2
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   Deployment of Airflow via "Airflow Helm Chart (User Community)" to 
Kubernetes Cluster (EKS).
   Triggering Pods via KubernetesPodOperator.
   
   
   ### What happened
   
   Sometimes tasks/pods are failing directly after starting in
   `pod_manager: read_pod_logs`
   
   with:
   ```
    kubernetes.client.exceptions.ApiException: (500)                            
                                                                                
                                                                                
                   
    Reason: Internal Server Error                                               
                                                                                
                                                                                
                   
    HTTP response headers: HTTPHeaderDict({'Audit-Id': 
'70fd7044-c526-4061-9e80-ced705d0ccdc', 'Cache-Control': 'no-cache, private', 
'Content-Type': 'application/json', 'Date': 'Mon, 11 Nov 2024 04:33:45 GMT', 
'Content-Length': '249'})                        
    HTTP response body: 
b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get
 
\\"https://172.17.134.89:10250/containerLogs/monitoring/kowalski-auto-ua72q9pi/base?follow=true\\u0026timestamps=true\\":
 remote error: tls: internal  error","code":500}\n'     
   ```
   Snippet of Log:
   ```
   [2024-11-11T04:33:45.253+0000] {taskinstance.py:3311} ERROR - Task failed 
with exception
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 767, in _execute_task
       result = _execute_callable(context=context, **execute_callable_kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 733, in _execute_callable
       return ExecutionCallableRunner(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/operator_helpers.py",
 line 252, in run
       return self.func(*args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py",
 line 417, in wrapper
       return func(self, *args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 594, in execute
       return self.execute_sync(context)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 627, in execute_sync
       self.await_pod_completion(pod=self.pod)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
336, in wrapped_f
       return copy(f, *args, **kw)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
475, in __call__
       do = self.iter(retry_state=retry_state)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
376, in iter
       result = action(retry_state)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
398, in <lambda>
       self._add_action_func(lambda rs: rs.outcome.result())
     File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in 
result
       return self.__get_result()
     File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in 
__get_result
       raise self._exception
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
478, in __call__
       result = fn(*args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 678, in await_pod_completion
       raise exc
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 663, in await_pod_completion
       self.pod_manager.fetch_requested_container_logs(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 587, in fetch_requested_container_logs
       status = self.fetch_container_logs(pod=pod, container_name=c, 
follow=follow_logs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 510, in fetch_container_logs
       last_log_time, exc = consume_logs(since_time=last_log_time)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 440, in consume_logs
       logs = self.read_pod_logs(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
336, in wrapped_f
       return copy(f, *args, **kw)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
475, in __call__
       do = self.iter(retry_state=retry_state)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
376, in iter
       result = action(retry_state)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
418, in exc_check
       raise retry_exc.reraise()
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
185, in reraise
       raise self.last_attempt.result()
     File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in 
result
       return self.__get_result()
     File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in 
__get_result
       raise self._exception
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
478, in __call__
       result = fn(*args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 675, in read_pod_logs
       logs = self._client.read_namespaced_pod_log(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py",
 line 23957, in read_namespaced_pod_log
       return self.read_namespaced_pod_log_with_http_info(name, namespace, 
**kwargs)  # noqa: E501
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py",
 line 24076, in read_namespaced_pod_log_with_http_info
       return self.api_client.call_api(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py",
 line 348, in call_api
       return self.__call_api(resource_path, method,
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py",
 line 180, in __call_api
       response_data = self.request(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py",
 line 373, in request
       return self.rest_client.GET(url,
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/rest.py", 
line 244, in GET
       return self.request("GET", url,
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/rest.py", 
line 238, in request
       raise ApiException(http_resp=r)
   kubernetes.client.exceptions.ApiException: (500)
   Reason: Internal Server Error
   HTTP response headers: HTTPHeaderDict({'Audit-Id': 
'70fd7044-c526-4061-9e80-ced705d0ccdc', 'Cache-Control': 'no-cache, private', 
'Content-Type': 'application/json', 'Date': 'Mon, 11 Nov 2024 04:33:45 GMT', 
'Content-Length': '249'})
   HTTP response body: 
b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get
 
\\"https://172.17.134.89:10250/containerLogs/monitoring/kowalski-auto-ua72q9pi/base?follow=true\\u0026timestamps=true\\":
 remote error: tls: internal error","code":500}\n'
   [2024-11-11T04:33:45.275+0000] {taskinstance.py:906} DEBUG - Task Duration 
set to 78.504676
   [2024-11-11T04:33:45.276+0000] {taskinstance.py:928} DEBUG - Clearing 
next_method and next_kwargs.
   [2024-11-11T04:33:45.277+0000] {taskinstance.py:1225} INFO - Marking task as 
FAILED. dag_id=kowalski-auto, task_id=kowalski-auto, 
run_id=kowalski-a2a-multilappen-dkha-d81ea11e-473c-418b-adf1-ac0d3295737e, 
execution_date=20241111T043036, start_date=20241111T043226, 
end_date=20241111T043345
   [2024-11-11T04:33:45.387+0000] {taskinstance.py:340} INFO - ::group::Post 
task execution logs
   [2024-11-11T04:33:45.388+0000] {cli_action_loggers.py:98} DEBUG - Calling 
callbacks: []
   [2024-11-11T04:33:45.389+0000] {standard_task_runner.py:124} ERROR - Failed 
to execute job 125683 for task kowalski-auto ((500)
   Reason: Internal Server Error
   HTTP response headers: HTTPHeaderDict({'Audit-Id': 
'70fd7044-c526-4061-9e80-ced705d0ccdc', 'Cache-Control': 'no-cache, private', 
'Content-Type': 'application/json', 'Date': 'Mon, 11 Nov 2024 04:33:45 GMT', 
'Content-Length': '249'})
   HTTP response body: 
b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get
 
\\"https://172.17.134.89:10250/containerLogs/monitoring/kowalski-auto-ua72q9pi/base?follow=true\\u0026timestamps=true\\":
 remote error: tls: internal error","code":500}\n'
   ; 132)
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/task/task_runner/standard_task_runner.py",
 line 117, in _start_by_fork
       ret = args.func(args, dag=self.dag)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/cli_config.py", 
line 49, in command
       return func(*args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/cli.py", line 
116, in wrapper
       return f(*args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py",
 line 483, in task_run
       task_return_code = _run_task_by_selected_method(args, _dag, ti)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py",
 line 256, in _run_task_by_selected_method
       return _run_raw_task(args, ti)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/cli/commands/task_command.py",
 line 341, in _run_raw_task
       return ti._run_raw_task(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/session.py", 
line 97, in wrapper
       return func(*args, session=session, **kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 3005, in _run_raw_task
       return _run_raw_task(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 273, in _run_raw_task
       TaskInstance._execute_task_with_callbacks(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 3159, in _execute_task_with_callbacks
       result = self._execute_task(context, task_orig)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 3183, in _execute_task
       return _execute_task(self, context, task_orig)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 767, in _execute_task
       result = _execute_callable(context=context, **execute_callable_kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py",
 line 733, in _execute_callable
       return ExecutionCallableRunner(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/operator_helpers.py",
 line 252, in run
       return self.func(*args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py",
 line 417, in wrapper
       return func(self, *args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 594, in execute
       return self.execute_sync(context)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 627, in execute_sync
       self.await_pod_completion(pod=self.pod)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
336, in wrapped_f
       return copy(f, *args, **kw)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
475, in __call__
       do = self.iter(retry_state=retry_state)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
376, in iter
       result = action(retry_state)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
398, in <lambda>
       self._add_action_func(lambda rs: rs.outcome.result())
     File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in 
result
       return self.__get_result()
     File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in 
__get_result
       raise self._exception
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
478, in __call__
       result = fn(*args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 678, in await_pod_completion
       raise exc
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 663, in await_pod_completion
       self.pod_manager.fetch_requested_container_logs(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 587, in fetch_requested_container_logs
       status = self.fetch_container_logs(pod=pod, container_name=c, 
follow=follow_logs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 510, in fetch_container_logs
       last_log_time, exc = consume_logs(since_time=last_log_time)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 440, in consume_logs
       logs = self.read_pod_logs(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
336, in wrapped_f
       return copy(f, *args, **kw)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
475, in __call__
       do = self.iter(retry_state=retry_state)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
376, in iter
       result = action(retry_state)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
418, in exc_check
       raise retry_exc.reraise()
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
185, in reraise
       raise self.last_attempt.result()
     File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in 
result
       return self.__get_result()
     File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in 
__get_result
       raise self._exception
     File 
"/home/airflow/.local/lib/python3.8/site-packages/tenacity/__init__.py", line 
478, in __call__
       result = fn(*args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 675, in read_pod_logs
       logs = self._client.read_namespaced_pod_log(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py",
 line 23957, in read_namespaced_pod_log
       return self.read_namespaced_pod_log_with_http_info(name, namespace, 
**kwargs)  # noqa: E501
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py",
 line 24076, in read_namespaced_pod_log_with_http_info
       return self.api_client.call_api(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py",
 line 348, in call_api
       return self.__call_api(resource_path, method,
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py",
 line 180, in __call_api
       response_data = self.request(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py",
 line 373, in request
       return self.rest_client.GET(url,
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/rest.py", 
line 244, in GET
       return self.request("GET", url,
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/rest.py", 
line 238, in request
       raise ApiException(http_resp=r)
   kubernetes.client.exceptions.ApiException: (500)
   Reason: Internal Server Error
   HTTP response headers: HTTPHeaderDict({'Audit-Id': 
'70fd7044-c526-4061-9e80-ced705d0ccdc', 'Cache-Control': 'no-cache, private', 
'Content-Type': 'application/json', 'Date': 'Mon, 11 Nov 2024 04:33:45 GMT', 
'Content-Length': '249'})
   HTTP response body: 
b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Get
 
\\"https://172.17.134.89:10250/containerLogs/monitoring/kowalski-auto-ua72q9pi/base?follow=true\\u0026timestamps=true\\":
 remote error: tls: internal error","code":500}\n'
   [2024-11-11T04:33:45.427+0000] {local_task_job_runner.py:266} INFO - Task 
exited with return code 1
   [2024-11-11T04:33:45.447+0000] {taskinstance.py:3859} DEBUG - Skip locked 
rows, rollback
   [2024-11-11T04:33:45.454+0000] {local_task_job_runner.py:245} INFO - 
::endgroup::
   ```
   
   ### What you think should happen instead
   
   The exception mentioned in "What happened" states: "tls: internal  error"
   The assumption is that this is a case of the CSR not being approved (yet)
   
   The comment in `podmanager:` fetch_container_logs states:
   "
           Between when the pod starts and logs being available, there might be 
a delay due to CSR not approved
           and signed yet. In such situation, ApiException is thrown. This is 
why we are **retrying** on this
           specific exception.
   "
   This leads me to believe that in cases of failures in trying to consume logs 
from a pod because of CSR not being approved, will lead to retries. 
   Unfortunately I do not see any retries being made in this case (see provided 
log)
   
   Is my assumption correct or am I missing something?
   Thanks :)
   
   ### How to reproduce
   
   Start a pod via KubernetesPodOperator and start consuming logs before CSR is 
approved.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to