[jira] [Commented] (SPARK-2282) PySpark crashes if too many tasks complete quickly

Ken Carlile (JIRA) Wed, 23 Jul 2014 07:23:10 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071767#comment-14071767
 ]


Ken Carlile commented on SPARK-2282:
------------------------------------

Merging just the two files also did not work. I received a bunch of these 
errors during the test: 
{code}Exception happened during processing of request from ('127.0.0.1', 33116)
Traceback (most recent call last):
  File "/usr/local/python-2.7.6/lib/python2.7/SocketServer.py", line 295, in 
_handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/local/python-2.7.6/lib/python2.7/SocketServer.py", line 321, in 
process_request
    self.finish_request(request, client_address)
  File "/usr/local/python-2.7.6/lib/python2.7/SocketServer.py", line 334, in 
finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/python-2.7.6/lib/python2.7/SocketServer.py", line 649, in 
__init__
    self.handle()
  File "/usr/local/spark-current/python/pyspark/accumulators.py", line 224, in 
handle
    num_updates = read_int(self.rfile)
  File "/usr/local/spark-current/python/pyspark/serializers.py", line 337, in 
read_int
    raise EOFError
EOFError
{code}
And then it errored out with the usual java thing. 

> PySpark crashes if too many tasks complete quickly
> --------------------------------------------------
>
>                 Key: SPARK-2282
>                 URL: https://issues.apache.org/jira/browse/SPARK-2282
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 0.9.1, 1.0.0, 1.0.1
>            Reporter: Aaron Davidson
>            Assignee: Aaron Davidson
>             Fix For: 0.9.2, 1.0.0, 1.0.1
>
>
> Upon every task completion, PythonAccumulatorParam constructs a new socket to 
> the Accumulator server running inside the pyspark daemon. This can cause a 
> buildup of used ephemeral ports from sockets in the TIME_WAIT termination 
> stage, which will cause the SparkContext to crash if too many tasks complete 
> too quickly. We ran into this bug with 17k tasks completing in 15 seconds.
> This bug can be fixed outside of Spark by ensuring these properties are set 
> (on a linux server);
> echo "1" > /proc/sys/net/ipv4/tcp_tw_reuse
> echo "1" > /proc/sys/net/ipv4/tcp_tw_recycle
> or by adding the SO_REUSEADDR option to the Socket creation within Spark.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2282) PySpark crashes if too many tasks complete quickly

Reply via email to