[ https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071767#comment-14071767 ]
Ken Carlile commented on SPARK-2282: ------------------------------------ Merging just the two files also did not work. I received a bunch of these errors during the test: {code}Exception happened during processing of request from ('127.0.0.1', 33116) Traceback (most recent call last): File "/usr/local/python-2.7.6/lib/python2.7/SocketServer.py", line 295, in _handle_request_noblock self.process_request(request, client_address) File "/usr/local/python-2.7.6/lib/python2.7/SocketServer.py", line 321, in process_request self.finish_request(request, client_address) File "/usr/local/python-2.7.6/lib/python2.7/SocketServer.py", line 334, in finish_request self.RequestHandlerClass(request, client_address, self) File "/usr/local/python-2.7.6/lib/python2.7/SocketServer.py", line 649, in __init__ self.handle() File "/usr/local/spark-current/python/pyspark/accumulators.py", line 224, in handle num_updates = read_int(self.rfile) File "/usr/local/spark-current/python/pyspark/serializers.py", line 337, in read_int raise EOFError EOFError {code} And then it errored out with the usual java thing. > PySpark crashes if too many tasks complete quickly > -------------------------------------------------- > > Key: SPARK-2282 > URL: https://issues.apache.org/jira/browse/SPARK-2282 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 0.9.1, 1.0.0, 1.0.1 > Reporter: Aaron Davidson > Assignee: Aaron Davidson > Fix For: 0.9.2, 1.0.0, 1.0.1 > > > Upon every task completion, PythonAccumulatorParam constructs a new socket to > the Accumulator server running inside the pyspark daemon. This can cause a > buildup of used ephemeral ports from sockets in the TIME_WAIT termination > stage, which will cause the SparkContext to crash if too many tasks complete > too quickly. We ran into this bug with 17k tasks completing in 15 seconds. > This bug can be fixed outside of Spark by ensuring these properties are set > (on a linux server); > echo "1" > /proc/sys/net/ipv4/tcp_tw_reuse > echo "1" > /proc/sys/net/ipv4/tcp_tw_recycle > or by adding the SO_REUSEADDR option to the Socket creation within Spark. -- This message was sent by Atlassian JIRA (v6.2#6252)