Aaron Davidson created SPARK-2282:
-------------------------------------

             Summary: PySpark crashes if too many tasks complete quickly
                 Key: SPARK-2282
                 URL: https://issues.apache.org/jira/browse/SPARK-2282
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.0.0, 1.0.1
            Reporter: Aaron Davidson
            Assignee: Aaron Davidson


Upon every task completion, PythonAccumulatorParam constructs a new socket to 
the Accumulator server running inside the pyspark daemon. This can cause a 
buildup of used ephemeral ports from sockets in the TIME_WAIT termination 
stage, which will cause the SparkContext to crash if too many tasks complete 
too quickly. We ran into this bug with 17k tasks completing in 15 seconds.

This bug can be fixed outside of Spark by ensuring these properties are set (on 
a linux server);
echo "1" > /proc/sys/net/ipv4/tcp_tw_reuse
echo "1" > /proc/sys/net/ipv4/tcp_tw_recycle

or by adding the SO_REUSEADDR option to the Socket creation within Spark.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to