Marcelo Vanzin created SPARK-11866:
--------------------------------------
Summary: RpcEnv RPC timeouts can lead to errors, leak in transport
library.
Key: SPARK-11866
URL: https://issues.apache.org/jira/browse/SPARK-11866
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.6.0
Reporter: Marcelo Vanzin
Priority: Minor
The {{RpcEnv}} code in spark-core has its own timeout handling capabilities,
which can clash with the transport library's timeout handling in two ways when
replies to an RPC message are never sent.
- if the channel has been idle for a while, the transport library will close
the channel because it may think it's hung; this could cause other errors since
the {{RpcEnv}}-based code might not expect those channels to be closed.
- if the reply never arrives and the channel is not idle, there's state kept in
the network library that will never be cleaned up. the {{RpcEnv}}-level timeout
code should clean up that state since it's not interested in that RPC anymore.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]