[ 
https://issues.apache.org/jira/browse/SPARK-23785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412130#comment-16412130
 ] 

Marcelo Vanzin commented on SPARK-23785:
----------------------------------------

This is a little trickier than just the checks you have in the PR.

The check that is triggering in Hive is on the {{LauncherBackend}} side. So it 
has somehow already been closed, and a {{setState}} call happens. That can 
happen if there are two calls to {{LocalSchedulerBackend.stop}}, which can 
happen if someone with a launcher handle calls {{stop()}} on the handle. But 
the code should be safe against that and just ignore subsequent calls.

The race you describe also exists; it's not what the exception in the Hive bug 
shows, though.

So perhaps it's better to do a few different things:

- add the checks in your PR
- in LauncherBackend.BackendConnection, set "isDisconnected" before calling 
super.close()
- in that same class, override the "send()" method to ignore "SocketException", 
to handle the second race.


> LauncherBackend doesn't check state of connection before setting state
> ----------------------------------------------------------------------
>
>                 Key: SPARK-23785
>                 URL: https://issues.apache.org/jira/browse/SPARK-23785
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.0
>            Reporter: Sahil Takiar
>            Priority: Major
>
> Found in HIVE-18533 while trying to integration with the 
> {{InProcessLauncher}}. {{LauncherBackend}} doesn't check the state of its 
> connection to the {{LauncherServer}} before trying to run {{setState}} - 
> which sends a {{SetState}} message on the connection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to