rcrchawla opened a new issue, #63243:
URL: https://github.com/apache/airflow/issues/63243

   ### Body
   
   Airflow task got failed where spark kube app is running. Although spark kube 
app is long running app most probably around 1-2 hour. And there are 
concurrently many task running at the same time usually it happens between 
02:30 am - 03:45 am UTC. 
   
   Q) What causing issue ? 
   A) Airflow task failed while spark kube app running
   
   Airflow version -- **3.0.4**
   
   Setup config
   2 API servers
   2 workers
   1 dag processor
   2 schedulers
   
   Deployment  --> HELM Chart deployment on Azure Kubernetes
   
   
   
   
   Please check below logs
   
   
   Worker logs : 
   -------------------------------------
   2026-03-10 02:33:56.191330 [info     ] Task 
execute_workload[8cbabf91-009f-44a6-86d1-bef109c70341] succeeded in 
2715.019189195242s: None [celery.app.trace]
   2026-03-10 02:39:57.112078 [info     ] Task finished                  
[supervisor] duration=1723.7576029417105 exit_code=0 final_state=success
   2026-03-10 02:39:57.128929 [info     ] Task 
execute_workload[9b3f27ec-09b5-424e-8d5c-412e541f51e8] succeeded in 
1723.8186896019615s: None [celery.app.trace]
   2026-03-10 02:40:50.688403 [info     ] Task finished                  
[supervisor] duration=744.0669570546597 exit_code=0 final_state=success
   2026-03-10 02:40:50.705538 [info     ] Task 
execute_workload[b08ac31a-2ee7-4029-b897-753157b18475] succeeded in 
744.139388079755s: None [celery.app.trace]
   2026-03-10 02:42:11.649891 [info     ] Task finished                  
[supervisor] duration=756.7588595808484 exit_code=0 final_state=success
   2026-03-10 02:42:11.666368 [info     ] Task 
execute_workload[0351c271-194e-4e58-87e4-a9c224351ab1] succeeded in 
756.8229349320754s: None [celery.app.trace]
   2026-03-10 02:43:37.239128 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:38.119304 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:38.640468 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:39.247588 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:39.425843 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:39.618220 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:40.002999 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:40.582177 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:41.186771 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:41.510710 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 1st time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:42.658853 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:43.171303 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:43.826966 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:44.330891 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:44.874859 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:44.922591 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:45.866775 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:46.194974 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:46.482845 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:46.750792 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 2nd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:48.198838 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:48.462121 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:49.749467 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:50.029438 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:50.834835 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:51.334847 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:51.431052 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:51.537615 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:52.567197 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:52.967177 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 3rd time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:53.615078 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:54.513959 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:56.442819 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:57.527549 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:57.765172 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:57.982839 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:58.099625 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:58.534632 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:59.007106 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:43:59.947380 [warning  ] Starting call to 
'airflow.sdk.api.client.Client.request', this is the 4th time calling it. 
[airflow.sdk.api.client]
   2026-03-10 02:44:02.200313 [warning  ] Failed to send heartbeat. Will be 
retried [supervisor] failed_heartbeats=1 max_retries=3 
ti_id=UUID('019cd54c-28b0-7e18-9a7b-71ba469bf545')
   
   
   API Server 
   ----------------------
   
   2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=152133 state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
   2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=155023 state=running ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
   2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-0.airflow-worker.de-services.svc.cluster.local 
current_pid=81402 state=running ti_id=019cd578-f8c1-7125-9906-ef64229dbba5
   2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=157917 state=running ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
   2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-0.airflow-worker.de-services.svc.cluster.local 
current_pid=86154 state=running ti_id=019cd542-0d45-75e4-95d5-a2c461e3e559
   2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
   INFO:     10.10.12.52:40870 - "GET /api/v2/version HTTP/1.1" 200 OK
   INFO:     10.10.12.52:40880 - "GET /api/v2/version HTTP/1.1" 200 OK
   2026-03-10 02:45:23 [debug    ] Processing heartbeat           
hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
pid=151395 ti_id=019cd542-0d47-7d93-a021-0cc2c9de7344
   2026-03-10 02:45:23 [debug    ] Refreshed token issued to Task 
[airflow.api_fastapi.execution_api.deps] refresh_when_less_than=120 
valid_left=73
   2026-03-10 02:45:23 [debug    ] Refreshed token issued to Task 
[airflow.api_fastapi.execution_api.deps] refresh_when_less_than=120 
valid_left=73
   2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
   2026-03-10 02:45:23 [debug    ] Processing heartbeat           
hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
pid=155023 ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
   [2026-03-10T02:45:23.575+0000] {exceptions.py:77} ERROR - Error with id 
9zBmdizJ
     File 
"/home/airflow/.local/lib/python3.12/site-packages/starlette/_exception_handler.py",
 line 42, in wrapped_app
       await app(scope, receive, sender)
     File 
"/home/airflow/.local/lib/python3.12/site-packages/starlette/routing.py", line 
75, in app
       response = await f(request)
                  ^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/fastapi/routing.py", line 
302, in app
       raw_response = await run_endpoint_function(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/fastapi/routing.py", line 
213, in run_endpoint_function
       return await dependant.call(**values)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/cadwyn/structure/versions.py",
 line 474, in decorator
       response = await self._convert_endpoint_response_to_version(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/cadwyn/structure/versions.py",
 line 520, in _convert_endpoint_response_to_version
       response_or_response_body: Union[FastapiResponse, object] = await 
run_in_threadpool(
                                                                   
^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/starlette/concurrency.py", 
line 38, in run_in_threadpool
       return await anyio.to_thread.run_sync(func)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/anyio/to_thread.py", line 
56, in run_sync
       return await get_async_backend().run_sync_in_worker_thread(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py",
 line 2476, in run_sync_in_worker_thread
       return await future
              ^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py",
 line 967, in run
       result = context.run(func, *args)
                ^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/cadwyn/schema_generation.py",
 line 515, in __call__
       return self._original_callable(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/api_fastapi/execution_api/routes/xcoms.py",
 line 419, in set_xcom
       session.flush()
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", 
line 3449, in flush
       self._flush(objects)
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", 
line 3588, in _flush
       with util.safe_reraise():
            ^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/langhelpers.py",
 line 70, in __exit__
       compat.raise_(
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py", 
line 211, in raise_
       raise exception
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", 
line 3549, in _flush
       flush_context.execute()
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py",
 line 456, in execute
       rec.execute(self)
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/unitofwork.py",
 line 630, in execute
       util.preloaded.orm_persistence.save_obj(
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py",
 line 245, in save_obj
       _emit_insert_statements(
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py",
 line 1097, in _emit_insert_statements
       c = connection._execute_20(
           ^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 1710, in _execute_20
       return meth(self, args_10style, kwargs_10style, execution_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", 
line 334, in _execute_on_connection
       return connection._execute_clauseelement(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 1577, in _execute_clauseelement
       ret = self._execute_context(
             ^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 1953, in _execute_context
       self._handle_dbapi_exception(
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 2134, in _handle_dbapi_exception
       util.raise_(
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/util/compat.py", 
line 211, in raise_
       raise exception
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/base.py", 
line 1910, in _execute_context
       self.dialect.do_execute(
     File 
"/home/airflow/.local/lib/python3.12/site-packages/sqlalchemy/engine/default.py",
 line 736, in do_execute
       cursor.execute(statement, parameters)
     File 
"/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 
179, in execute
       res = self._query(mogrified_query)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/cursors.py", line 
330, in _query
       db.query(q)
     File 
"/home/airflow/.local/lib/python3.12/site-packages/MySQLdb/connections.py", 
line 280, in query
       _mysql.connection.query(self, query)
   
   2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd518-d7c9-7e7e-bde2-efc6322e36a3
   2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
   2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=65618 state=running ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
   2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd526-91bc-7461-8be3-aa7574c5f60b
   2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=151858 state=running ti_id=019cd542-0d49-744c-aa72-a33d5ac4249d
   2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd542-0d45-75e4-95d5-a2c461e3e559
   2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd542-0d49-744c-aa72-a33d5ac4249d
   2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=152133 state=running ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
   2026-03-10 02:45:23 [debug    ] Retrieved current task state   
current_hostname=airflow-worker-1.airflow-worker.de-services.svc.cluster.local 
current_pid=157917 state=running ti_id=019cd54c-28ad-7db0-b0f8-d64ed0916d78
   2026-03-10 02:45:23 [debug    ] Heartbeat updated              state=running 
ti_id=019cd542-0d3e-7467-9f7a-4dfc2d7f0017
   
   
   What you think should happen instead?
   
   Airflow task should run without getting failed.
   
   ### Committer
   
   - [x] I acknowledge that I am a maintainer/committer of the Apache Airflow 
project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to