Aaron Davidson created SPARK-2282:
-------------------------------------
Summary: PySpark crashes if too many tasks complete quickly
Key: SPARK-2282
URL: https://issues.apache.org/jira/browse/SPARK-2282
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 1.0.0, 1.0.1
Reporter: Aaron Davidson
Assignee: Aaron Davidson
Upon every task completion, PythonAccumulatorParam constructs a new socket to
the Accumulator server running inside the pyspark daemon. This can cause a
buildup of used ephemeral ports from sockets in the TIME_WAIT termination
stage, which will cause the SparkContext to crash if too many tasks complete
too quickly. We ran into this bug with 17k tasks completing in 15 seconds.
This bug can be fixed outside of Spark by ensuring these properties are set (on
a linux server);
echo "1" > /proc/sys/net/ipv4/tcp_tw_reuse
echo "1" > /proc/sys/net/ipv4/tcp_tw_recycle
or by adding the SO_REUSEADDR option to the Socket creation within Spark.
--
This message was sent by Atlassian JIRA
(v6.2#6252)