[ 
https://issues.apache.org/jira/browse/SPARK-31219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-31219.
-----------------------------------
    Fix Version/s: 3.1.0
                   3.0.0
         Assignee: Manu Zhang
       Resolution: Fixed

> YarnShuffleService doesn't close idle netty channel
> ---------------------------------------------------
>
>                 Key: SPARK-31219
>                 URL: https://issues.apache.org/jira/browse/SPARK-31219
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 2.4.5, 3.0.0
>            Reporter: Manu Zhang
>            Assignee: Manu Zhang
>            Priority: Major
>             Fix For: 3.0.0, 3.1.0
>
>
> Recently, we find our YarnShuffleService has a lot of [half-open 
> connections|https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html]
>  where shuffle servers' connections are active while clients have already 
> closed. 
> For example, from server's `ss -nt sport = :7337` output we have
> {code:java}
> ESTAB 0 0 server:7337 client:port
> {code}
> However, on client `ss -nt dport =: 7337 | grep server` would return nothing.
> Looking at the code,  `YarnShuffleService` creates a `TransportContext` with 
> `closeIdleConnections` set to false.
> {code:java}
> public class YarnShuffleService extends AuxiliaryService {
>   ...
>   @Override  protected void serviceInit(Configuration conf) throws Exception 
> { 
>     ...     
>     transportContext = new TransportContext(transportConf, blockHandler); 
>     ...
>   }
>   ...
> }
> public class TransportContext implements Closeable {
>   ...
>   public TransportContext(TransportConf conf, RpcHandler rpcHandler) {       
>     this(conf, rpcHandler, false, false);  
>   }
>   public TransportContext(TransportConf conf, RpcHandler rpcHandler, boolean 
> closeIdleConnections) {    
>     this(conf, rpcHandler, closeIdleConnections, false);  
>   }
>   ...
> }{code}
> Hence, it's possible the channel  may never get closed at server side if the 
> server misses the event that the client has closed it.
> I find that parameter is true for `ExternalShuffleService`.
> Is there any reason for the difference here ?  Can we enable 
> closeIdleConnections in YarnShuffleService or at least add a configuration to 
> enable it ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to