[
https://issues.apache.org/jira/browse/SPARK-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-704:
----------------------------
Component/s: Spark Core
> ConnectionManager sometimes cannot detect loss of sending connections
> ---------------------------------------------------------------------
>
> Key: SPARK-704
> URL: https://issues.apache.org/jira/browse/SPARK-704
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Reporter: Charles Reiss
> Assignee: Henry Saputra
>
> ConnectionManager currently does not detect when SendingConnections
> disconnect except if it is trying to send through them. As a result, a node
> failure just after a connection is initiated but before any acknowledgement
> messages can be sent may result in a hang.
> ConnectionManager has code intended to detect this case by detecting the
> failure of a corresponding ReceivingConnection, but this code assumes that
> the remote host:port of the ReceivingConnection is the same as the
> ConnectionManagerId, which is almost never true. Additionally, there does not
> appear to be any reason to assume a corresponding ReceivingConnection will
> exist.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]