[
https://issues.apache.org/jira/browse/SPARK-57657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-57657:
---------------------------------
Description:
{{ClientStreamingQuerySuite."listener events"}} is flaky (e.g. Java 21
connect): the QueryStarted/QueryProgress events arrive but the terminal
QueryTerminatedEvent is never received, even though the query has stopped.
{{SparkConnectListenerBusListener.send}} removes the server-side listener and
stops sending ALL further events on the first {{onNext}} exception, so a single
transient failure on a frequent progress event silently drops the later
terminate event. Fix: retry onNext a small bounded number of times before
tearing the listener down (cleanup still happens when the client is genuinely
unresponsive). Test diagnostics are added to debug any future recurrence in
scheduled jobs, since the connect server logs are not captured in CI.
> Spark Connect should not drop streaming listener events on a transient send
> failure
> -----------------------------------------------------------------------------------
>
> Key: SPARK-57657
> URL: https://issues.apache.org/jira/browse/SPARK-57657
> Project: Spark
> Issue Type: Bug
> Components: Connect
> Affects Versions: 5.0.0
> Reporter: Hyukjin Kwon
> Priority: Major
>
> {{ClientStreamingQuerySuite."listener events"}} is flaky (e.g. Java 21
> connect): the QueryStarted/QueryProgress events arrive but the terminal
> QueryTerminatedEvent is never received, even though the query has stopped.
> {{SparkConnectListenerBusListener.send}} removes the server-side listener and
> stops sending ALL further events on the first {{onNext}} exception, so a
> single transient failure on a frequent progress event silently drops the
> later terminate event. Fix: retry onNext a small bounded number of times
> before tearing the listener down (cleanup still happens when the client is
> genuinely unresponsive). Test diagnostics are added to debug any future
> recurrence in scheduled jobs, since the connect server logs are not captured
> in CI.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]