Marcelo Vanzin created SPARK-11866:
--------------------------------------

             Summary: RpcEnv RPC timeouts can lead to errors, leak in transport 
library.
                 Key: SPARK-11866
                 URL: https://issues.apache.org/jira/browse/SPARK-11866
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.6.0
            Reporter: Marcelo Vanzin
            Priority: Minor


The {{RpcEnv}} code in spark-core has its own timeout handling capabilities, 
which can clash with the transport library's timeout handling in two ways when 
replies to an RPC message are never sent.

- if the channel has been idle for a while, the transport library will close 
the channel because it may think it's hung; this could cause other errors since 
the {{RpcEnv}}-based code might not expect those channels to be closed.

- if the reply never arrives and the channel is not idle, there's state kept in 
the network library that will never be cleaned up. the {{RpcEnv}}-level timeout 
code should clean up that state since it's not interested in that RPC anymore.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to