Biruk Tesfaye created SPARK-55448:
-------------------------------------
Summary: Spark Connect query execution events dropped when session
closes while query running
Key: SPARK-55448
URL: https://issues.apache.org/jira/browse/SPARK-55448
Project: Spark
Issue Type: Bug
Components: Connect
Affects Versions: 4.1.1, 3.5.8, 4.0.2, 3.4.4
Reporter: Biruk Tesfaye
There is a race condition in Spark Connect's execution lifecycle where
SparkListenerBus query events (Cancelled/Closed) aren't sent if a session
closes while a query is still running.
The root cause is ExecuteEventsManager.assertStatus(), which validates that the
session status is Stared before allowing any event to be sent. When trying to
close a session, the thread interrupts all execution threads for the session
and proceeds to close the session, marking the session as closed. Each of the
interrupted threads' cleanup path tries to send its terminal event which would
fail if the main thread marked the session as closed before, causing an
IllegalStateException that prevents the events from ever reaching the listener
bus.
The proposed fix allows ExecuteEventsManager.assertStatus() to accept
SessionStatus.Closed alongside Started so terminal events can still be sent
during execution cleanup. This is safe because the underlying SparkContext and
its listener bus outlive the Spark Connect session, and all downstream
operations after the assert are independent of session state. The assert
continues to block SessionStatus.Pending as a guard against events on
uninitialized sessions.
A more robust solution might be to introduce a SessionStatus.Closing state that
exists until all execution threads are cleaned up then transitions to
SessionStatus.Close.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]