dadonnelly316 opened a new issue, #28836:
URL: https://github.com/apache/airflow/issues/28836
### Apache Airflow version
2.5.0
### What happened
The airflow scheduler makes a call the the K8 API to create pod for a task
run, but returns a 400+ http response code. This causes all subsequent airflow
tasks to be stuck in "queued" or "scheduled" state. The scheduler must be
restarted for tasks to enter the running state.
Similar to #28328, but not seeing the ConnectionResetError exception when
calling Executor.end
```airflow-scheduler Exception when attempting to create Namespaced Pod
airflow-scheduler Traceback (most recent call last):
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py",
line 269, in run_pod_async
resp = self.kube_client.create_namespaced_pod(
File
"/usr/local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py",
line 7356, in create_namespaced_pod
return self.create_namespaced_pod_with_http_info(namespace, body,
**kwargs) # noqa: E501
File
"/usr/local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py",
line 7455, in create_namespaced_pod_with_http_info
return self.api_client.call_api(
File
"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line
348, in call_api
return self.__call_api(resource_path, method,
File
"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line
180, in __call_api
response_data = self.request(
File
"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line
391, in request
return self.rest_client.POST(url,
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py",
line 275, in POST
return self.request("POST", url,
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py",
line 234, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (500)
airflow-scheduler Reason: Internal Server Error
airflow-scheduler urllib3.exceptions.ProtocolError: ('Connection aborted.',
RemoteDisconnected('Remote end closed connection without response'))
Exception when executing SchedulerJob._run_scheduler_loop
airflow-scheduler Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py",
line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py",
line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py",
line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.9/http/client.py", line 1377, in getresponse
response.begin()
File "/usr/local/lib/python3.9/http/client.py", line 320, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.9/http/client.py", line 289, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
airflow-scheduler http.client.RemoteDisconnected: Remote end closed
connection without response
airflow-scheduler During handling of the above exception, another exception
occurred:
airflow-scheduler Traceback (most recent call last):
File
"/usr/local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line
759, in _execute
self._run_scheduler_loop()
File
"/usr/local/lib/python3.9/site-packages/airflow/jobs/scheduler_job.py", line
887, in _run_scheduler_loop
self.executor.heartbeat()
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/base_executor.py",
line 175, in heartbeat
self.sync()
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py",
line 632, in sync
self.kube_scheduler.run_next(task)
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py",
line 344, in run_next
self.run_pod_async(pod, **self.kube_config.kube_client_request_args)
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py",
line 275, in run_pod_async
raise e
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py",
line 269, in run_pod_async
resp = self.kube_client.create_namespaced_pod(
File
"/usr/local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py",
line 7356, in create_namespaced_pod
return self.create_namespaced_pod_with_http_info(namespace, body,
**kwargs) # noqa: E501
File
"/usr/local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py",
line 7455, in create_namespaced_pod_with_http_info
return self.api_client.call_api(
File
"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line
348, in call_api
return self.__call_api(resource_path, method,
File
"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line
180, in __call_api
response_data = self.request(
File
"/usr/local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line
391, in request
return self.rest_client.POST(url,
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py",
line 275, in POST
return self.request("POST", url,
File "/usr/local/lib/python3.9/site-packages/kubernetes/client/rest.py",
line 168, in request
r = self.pool_manager.request(
File "/usr/local/lib/python3.9/site-packages/urllib3/request.py", line 78,
in request
return self.request_encode_body(
File "/usr/local/lib/python3.9/site-packages/urllib3/request.py", line
170, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "/usr/local/lib/python3.9/site-packages/urllib3/poolmanager.py", line
376, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py",
line 787, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line
550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.9/site-packages/urllib3/packages/six.py",
line 769, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py",
line 703, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py",
line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py",
line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.9/http/client.py", line 1377, in getresponse
response.begin()
File "/usr/local/lib/python3.9/http/client.py", line 320, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.9/http/client.py", line 289, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
airflow-scheduler urllib3.exceptions.ProtocolError: ('Connection aborted.',
RemoteDisconnected('Remote end closed connection without response'))
airflow-scheduler error Unknown error in KubernetesJobWatcher. Failing
airflow-scheduler Traceback (most recent call last):
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py",
line 104, in run
self.resource_version = self._run(
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py",
line 166, in _run
self.process_status(
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py",
line 218, in process_status
self.watcher_queue.put((pod_id, namespace, State.FAILED, annotations,
resource_version))
File "<string>", line 2, in put
File "/usr/local/lib/python3.9/multiprocessing/managers.py", line 809, in
_callmethod
conn.send((self._id, methodname, args, kwds))
File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 206,
in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 411,
in _send_bytes
self._send(header + buf)
File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 368,
in _send
n = write(self._handle, buf)
airflow-scheduler BrokenPipeError: [Errno 32] Broken pipe
airflow-scheduler Process KubernetesJobWatcher-5:
airflow-scheduler Traceback (most recent call last):
File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in
_bootstrap
self.run()
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py",
line 104, in run
self.resource_version = self._run(
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py",
line 166, in _run
self.process_status(
File
"/usr/local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py",
line 218, in process_status
self.watcher_queue.put((pod_id, namespace, State.FAILED, annotations,
resource_version))
File "<string>", line 2, in put
File "/usr/local/lib/python3.9/multiprocessing/managers.py", line 809, in
_callmethod
conn.send((self._id, methodname, args, kwds))
File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 206,
in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 411,
in _send_bytes
self._send(header + buf)
File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 368,
in _send
n = write(self._handle, buf)
airflow-scheduler BrokenPipeError: [Errno 32] Broken pipe```
### What you think should happen instead
Handle ApiException - we've this error for multiple 4XX and 5XX response
codes.
### How to reproduce
_No response_
### Operating System
Debian GNU/Linux 11 (bullseye)
### Versions of Apache Airflow Providers
_No response_
### Deployment
Other
### Deployment details
K8 deployment
### Anything else
It's difficult to tell how often this issue occurs since it can go unnoticed
in a CI environment where the scheduler is often restarted.
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]