[
https://issues.apache.org/jira/browse/SPARK-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242831#comment-14242831
]
Ilya Ganelin commented on SPARK-4779:
-------------------------------------
I've seen this issue on Scala as well. This happens during large shuffles when
an intermediate stage of the shuffle map/reduce fails due to memory
constraints. I have not received any suggestions on how to resolve it short of
increasing available memory and shuffling smaller sizes.
> PySpark Shuffle Fails Looking for Files that Don't Exist when low on Memory
> ---------------------------------------------------------------------------
>
> Key: SPARK-4779
> URL: https://issues.apache.org/jira/browse/SPARK-4779
> Project: Spark
> Issue Type: Bug
> Components: PySpark, Shuffle
> Affects Versions: 1.1.0
> Environment: ec2 launched cluster with scripts
> 6 Nodes
> c3.2xlarge
> Reporter: Brad Willard
>
> When Spark is tight on memory it starts saying files don't exist during
> shuffle causing tasks to fail and be rebuilt destroying performance.
> The same code works flawlessly with smaller datasets with less memory
> pressure I assume.
> 14/12/06 18:39:37 WARN scheduler.TaskSetManager: Lost task 292.0 in stage 3.0
> (TID 1099, ip-10-13-192-209.ec2.internal):
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
> File "/root/spark/python/pyspark/worker.py", line 79, in main
> serializer.dump_stream(func(split_index, iterator), outfile)
> File "/root/spark/python/pyspark/serializers.py", line 196, in dump_stream
> self.serializer.dump_stream(self._batched(iterator), stream)
> File "/root/spark/python/pyspark/serializers.py", line 127, in dump_stream
> for obj in iterator:
> File "/root/spark/python/pyspark/serializers.py", line 185, in _batched
> for item in iterator:
> File "/root/spark/python/pyspark/shuffle.py", line 370, in _external_items
> self.mergeCombiners(self.serializer.load_stream(open(p)),
> IOError: [Errno 2] No such file or directory:
> '/mnt/spark/spark-local-20141206182702-8748/python/16070/66618000/1/18'
>
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:124)
> org.apache.spark.api.python.PythonRDD$$anon$1.next(PythonRDD.scala:91)
> org.apache.spark.api.python.PythonRDD$$anon$1.next(PythonRDD.scala:87)
>
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
> scala.collection.Iterator$$anon$12.next(Iterator.scala:357)
>
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
> scala.collection.Iterator$$anon$12.next(Iterator.scala:357)
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:335)
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:209)
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
>
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:184)
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
>
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:183)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]