[ 
https://issues.apache.org/jira/browse/SPARK-57657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-57657:
-----------------------------------
    Labels: pull-request-available  (was: )

> Spark Connect should not drop streaming listener events on a transient send 
> failure
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-57657
>                 URL: https://issues.apache.org/jira/browse/SPARK-57657
>             Project: Spark
>          Issue Type: Bug
>          Components: Connect
>    Affects Versions: 5.0.0
>            Reporter: Hyukjin Kwon
>            Priority: Major
>              Labels: pull-request-available
>
> {{ClientStreamingQuerySuite."listener events"}} is flaky (e.g. Java 21 
> connect): the QueryStarted/QueryProgress events arrive but the terminal 
> QueryTerminatedEvent is never received, even though the query has stopped.
> {{SparkConnectListenerBusListener.send}} removes the server-side listener and 
> stops sending ALL further events on the first {{onNext}} exception, so a 
> single transient failure on a frequent progress event silently drops the 
> later terminate event. Fix: retry onNext a small bounded number of times 
> before tearing the listener down (cleanup still happens when the client is 
> genuinely unresponsive). Test diagnostics are added to debug any future 
> recurrence in scheduled jobs, since the connect server logs are not captured 
> in CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to