[ 
https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deegue updated SPARK-28239:
---------------------------
    Description: 
When executing shuffle tasks, TCP connections(on port 7337 by default) will be 
established by shuffle service.
It will like:

 !screenshot-1.png! 

However, some of the TCP connections are still busy when the task is actually 
finished. These connections won't close automatically until we restart the 
NodeManager process.

Connections pile up and NodeManagers are getting slower and slower.

 !screenshot-2.png! 

These unclosed TCP connections stay busy and it seem doesn't take effect when I 
set ChannelOption.SO_KEEPALIVE to true according to 
[SPARK-23182|https://github.com/apache/spark/pull/20512].



  was:
When executing shuffle tasks, TCP connections(on port 7337 by default) will be 
established by shuffle service.
It will like:

 !screenshot-1.png! 

However, some of the TCP connections are still busy when the task is actually 
finished. These connections won't close automatically until we restart the 
NodeManager process.

Connections pile up and NodeManagers are getting slower and slower.

 !screenshot-2.png! 

These unclosed TCP connections stay busy and it seem doesn't take effect when I 
set ChannelOption.SO_KEEPALIVE to true according to 


> Make TCP connections created by shuffle service auto close on YARN 
> NodeManagers
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-28239
>                 URL: https://issues.apache.org/jira/browse/SPARK-28239
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, YARN
>    Affects Versions: 2.4.0
>         Environment: Hadoop2.6.0-CDH5.8.3(netty3)
> Spark2.4.0(netty4)
> Configs:
> spark.shuffle.service.enabled=true
>            Reporter: Deegue
>            Priority: Minor
>         Attachments: screenshot-1.png, screenshot-2.png
>
>
> When executing shuffle tasks, TCP connections(on port 7337 by default) will 
> be established by shuffle service.
> It will like:
>  !screenshot-1.png! 
> However, some of the TCP connections are still busy when the task is actually 
> finished. These connections won't close automatically until we restart the 
> NodeManager process.
> Connections pile up and NodeManagers are getting slower and slower.
>  !screenshot-2.png! 
> These unclosed TCP connections stay busy and it seem doesn't take effect when 
> I set ChannelOption.SO_KEEPALIVE to true according to 
> [SPARK-23182|https://github.com/apache/spark/pull/20512].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to