rcrchawla opened a new issue, #63243:
URL: https://github.com/apache/airflow/issues/63243
### Body
Airflow task got failed where spark kube app is running. Although spark kube
app is long running app most probably around 1-2 hour. And there are
concurrently many task running at the same time usually it happens between
02:30 am - 03:45 am UTC.
Q) What causing issue ?
A) Airflow task failed while spark kube app running
Airflow version -- **3.0.4**
Setup config
2 API servers
2 workers
1 dag processor
2 schedulers
Deployment --> HELM Chart deployment on Azure Kubernetes
Please check below logs
Worker logs :
-------------------------------------
2026-03-10 02:33:56.191330 [info ] Task
execute_workload[8cbabf91-009f-44a6-86d1-bef109c70341] succeeded in
2715.019189195242s: None [celery.app.trace]
2026-03-10 02:39:57.112078 [info ] Task finished
[supervisor] duration=1723.7576029417105 exit_code=0 final_state=success
2026-03-10 02:39:57.128929 [info ] Task
execute_workload[9b3f27ec-09b5-424e-8d5c-412e541f51e8] succeeded in
1723.8186896019615s: None [celery.app.trace]
2026-03-10 02:40:50.688403 [info ] Task finished
[supervisor] duration=744.0669570546597 exit_code=0 final_state=success
2026-03-10 02:40:50.705538 [info ] Task
execute_workload[b08ac31a-2ee7-4029-b897-753157b18475] succeeded in
744.139388079755s: None [celery.app.trace]
2026-03-10 02:42:11.649891 [info ] Task finished
[supervisor] duration=756.7588595808484 exit_code=0 final_state=success
2026-03-10 02:42:11.666368 [info ] Task
execute_workload[0351c271-194e-4e58-87e4-a9c224351ab1] succeeded in
756.8229349320754s: None [celery.app.trace]
2026-03-10 02:43:37.239128 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 1st time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:38.119304 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 1st time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:38.640468 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 1st time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:39.247588 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 1st time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:39.425843 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 1st time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:39.618220 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 1st time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:40.002999 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 1st time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:40.582177 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 1st time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:41.186771 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 1st time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:41.510710 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 1st time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:42.658853 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:43.171303 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:43.826966 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:44.330891 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:44.874859 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:44.922591 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:45.866775 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:46.194974 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:46.482845 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:46.750792 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:48.198838 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:48.462121 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:49.749467 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:50.029438 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:50.834835 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:51.334847 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:51.431052 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:51.537615 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:52.567197 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:52.967177 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:53.615078 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 4th time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:54.513959 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 4th time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:56.442819 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 4th time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:57.527549 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 4th time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:57.765172 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 4th time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:57.982839 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 4th time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:58.099625 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 4th time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:58.534632 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 4th time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:59.007106 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 4th time calling it.
[airflow.sdk.api.client]
2026-03-10 02:43:59.947380 [warning ] Starting call to
'airflow.sdk.api.client.Client.request', this is the 4th time calling it.
[airflow.sdk.api.client]
2026-03-10 02:44:02.200313 [warning ] Failed to send heartbeat. Will be
retried [supervisor] failed_heartbeats=1 max_retries=3
ti_id=UUID('019cd54c-28b0-7e18-9a7b-71ba469bf545')
API Server
----------------------
2026-03-10 02:45:23 [debug ] Retrieved current task state
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local
current_pid=152133 state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
2026-03-10 02:45:23 [debug ] Retrieved current task state
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local
current_pid=155023 state=running ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
2026-03-10 02:45:23 [debug ] Retrieved current task state
current_hostname=airflow-worker-0.airflow-worker.de-services.svc.cluster.local
current_pid=81402 state=running ti_id=019cd578-f8c1-7125-9906-ef64229dbba5
2026-03-10 02:45:23 [debug ] Retrieved current task state
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local
current_pid=157917 state=running ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
2026-03-10 02:45:23 [debug ] Retrieved current task state
current_hostname=airflow-worker-0.airflow-worker.de-services.svc.cluster.local
current_pid=86154 state=running ti_id=019cd542-0d45-75e4-95d5-a2c461e3e559
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running
ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
INFO: 10.10.12.52:40870 - "GET /api/v2/version HTTP/1.1" 200 OK
INFO: 10.10.12.52:40880 - "GET /api/v2/version HTTP/1.1" 200 OK
2026-03-10 02:45:23 [debug ] Processing heartbeat
hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local
pid=151395 ti_id=019cd542-0d47-7d93-a021-0cc2c9de7344
2026-03-10 02:45:23 [debug ] Refreshed token issued to Task
[airflow.api_fastapi.execution_api.deps] refresh_when_less_than=120
valid_left=73
2026-03-10 02:45:23 [debug ] Refreshed token issued to Task
[airflow.api_fastapi.execution_api.deps] refresh_when_less_than=120
valid_left=73
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running
ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
2026-03-10 02:45:23 [debug ] Processing heartbeat
hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local
pid=155023 ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
[2026-03-10T02:45:23.575+0000] {exceptions.py:77} ERROR - Error with id
9zBmdizJ
File
"/home/airflow/.local/lib/python3.12/site-packages/starlette/_exception_handler.py",
line 42, in wrapped_app
await app(scope, receive, sender)
File
"/home/airflow/.local/lib/python3.12/site-packages/starlette/routing.py", line
75, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/fastapi/routing.py", line
302, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/fastapi/routing.py", line
213, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/cadwyn/structure/versions.py",
line 474, in decorator
response = await self._convert_endpoint_response_to_version(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/cadwyn/structure/versions.py",
line 520, in _convert_endpoint_response_to_version
response_or_response_body: Union[FastapiResponse, object] = await
run_in_threadpool(
^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/starlette/concurrency.py",
line 38, in run_in_threadpool
return await anyio.to_thread.run_sync(func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/anyio/to_thread.py", line
56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py",
line 2476, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py",
line 967, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/cadwyn/schema_generation.py",
line 515, in __call__
return self._original_callable(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/airflow/api_fastapi/execution_api/routes/xcoms.py",
line 419, in set_xcom
session.flush()
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py",
line 3449, in flush
self._flush(objects)
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py",
line 3588, in _flush
with util.safe_reraise():
^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py",
line 70, in __exit__
compat.raise_(
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py",
line 211, in raise_
raise exception
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py",
line 3549, in _flush
flush_context.execute()
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py",
line 456, in execute
rec.execute(self)
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py",
line 630, in execute
util.preloaded.orm_persistence.save_obj(
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py",
line 245, in save_obj
_emit_insert_statements(
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py",
line 1097, in _emit_insert_statements
c = connection._execute_20(
^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py",
line 1710, in _execute_20
return meth(self, args_10style, kwargs_10style, execution_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/sql/elements.py",
line 334, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py",
line 1577, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py",
line 1953, in _execute_context
self._handle_dbapi_exception(
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py",
line 2134, in _handle_dbapi_exception
util.raise_(
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py",
line 211, in raise_
raise exception
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py",
line 1910, in _execute_context
self.dialect.do_execute(
File
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py",
line 736, in do_execute
cursor.execute(statement, parameters)
File
"/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/cursors.py", line
179, in execute
res = self._query(mogrified_query)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/cursors.py", line
330, in _query
db.query(q)
File
"/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/connections.py",
line 280, in query
_mysql.connection.query(self, query)
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running
ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running
ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
2026-03-10 02:45:23 [debug ] Retrieved current task state
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local
current_pid=65618 state=running ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running
ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
2026-03-10 02:45:23 [debug ] Retrieved current task state
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local
current_pid=151858 state=running ti_id=019cd542-0d49-744c-aa72-a33d5ac4249d
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running
ti_id=019cd542-0d45-75e4-95d5-a2c461e3e559
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running
ti_id=019cd542-0d49-744c-aa72-a33d5ac4249d
2026-03-10 02:45:23 [debug ] Retrieved current task state
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local
current_pid=152133 state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
2026-03-10 02:45:23 [debug ] Retrieved current task state
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local
current_pid=157917 state=running ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
2026-03-10 02:45:23 [debug ] Heartbeat updated state=running
ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
What you think should happen instead?
Airflow task should run without getting failed.
### Committer
- [x] I acknowledge that I am a maintainer/committer of the Apache Airflow
project.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]