[
https://issues.apache.org/jira/browse/SPARK-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042523#comment-14042523
]
Matthew Farrellee commented on SPARK-2244:
------------------------------------------
i have a theory - after a long bisect session the following commit was
implicated -
3870248740d83b0292ccca88a494ce19783847f0 is the first bad commit
commit 3870248740d83b0292ccca88a494ce19783847f0
Author: Kay Ousterhout <[email protected]>
Date: Wed Jun 18 13:16:26 2014 -0700
in that commit stderr is captured into a PIPE for the first time
theory is the pipe is filling a buffer that is never drained, resulting in an
eventual hang in communication.
testing this theory by adding an additional EchoOutputThread for proc.stderr,
which appears to resolve the issue
i'll come up with an appropriate fix and send a pull request
> pyspark - RDD action hangs (after previously succeeding)
> --------------------------------------------------------
>
> Key: SPARK-2244
> URL: https://issues.apache.org/jira/browse/SPARK-2244
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 1.1.0
> Environment: system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55 &
> 1.8.0_05
> code: sha b88238fa (master on 23 june 2014)
> cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running
> locally)
> Reporter: Matthew Farrellee
> Labels: openjdk, pyspark, python, shell, spark
>
> {code}
> $ ./dist/bin/pyspark
> Python 2.7.5 (default, Feb 19 2014, 13:47:28)
> [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Welcome to
> ____ __
> / __/__ ___ _____/ /__
> _\ \/ _ \/ _ `/ __/ '_/
> /__ / .__/\_,_/_/ /_/\_\ version 1.0.0-SNAPSHOT
> /_/
> Using Python version 2.7.5 (default, Feb 19 2014 13:47:28)
> SparkContext available as sc.
> >>> hundy = sc.parallelize(range(100))
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> >>> hundy.count()
> 100
> [repeat until hang, ctrl-C to get]
> >>> hundy.count()
> ^CTraceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py",
> line 774, in count
> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
> File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py",
> line 765, in sum
> return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
> File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py",
> line 685, in reduce
> vals = self.mapPartitions(func).collect()
> File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py",
> line 649, in collect
> bytesInJava = self._jrdd.collect().iterator()
> File
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
> line 535, in __call__
> File
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
> line 363, in send_command
> File
> "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
> line 472, in send_command
> File "/usr/lib64/python2.7/socket.py", line 430, in readline
> data = recv(1)
> KeyboardInterrupt
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)