[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-28239: ---------------------------------- Affects Version/s: (was: 2.4.0) 3.0.0 > Allow TCP connections created by shuffle service auto close on YARN > NodeManagers > -------------------------------------------------------------------------------- > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN > Affects Versions: 3.0.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > Configs: > spark.shuffle.service.enabled=true > Reporter: Deegue > Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > When executing shuffle tasks, TCP connections(on port 7337 by default) will > be established by shuffle service. > It will like: > !screenshot-1.png! > However, some of the TCP connections are still busy when the task is actually > finished. These connections won't close automatically until we restart the > NodeManager process. > Connections pile up and NodeManagers are getting slower and slower. > !screenshot-2.png! > These unclosed TCP connections stay busy and it seem doesn't take effect when > I set ChannelOption.SO_KEEPALIVE to true according to > [SPARK-23182|https://github.com/apache/spark/pull/20512]. > So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which > our cluster(running 10000+ jobs / day) is processing normally. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org