[ 
https://issues.apache.org/jira/browse/SPARK-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Ousterhout resolved SPARK-1498.
-----------------------------------
    Resolution: Fixed

> Spark can hang if pyspark tasks fail
> ------------------------------------
>
>                 Key: SPARK-1498
>                 URL: https://issues.apache.org/jira/browse/SPARK-1498
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 0.9.0, 0.9.1, 0.9.2
>            Reporter: Kay Ousterhout
>             Fix For: 1.0.0
>
>
> In pyspark, when some kinds of jobs fail, Spark hangs rather than returning 
> an error.  This is partially a scheduler problem -- the scheduler sometimes 
> thinks failed tasks succeed, even though they have a stack trace and 
> exception.
> You can reproduce this problem with:
> ardd = sc.parallelize([(1,2,3), (4,5,6)])
> brdd = sc.parallelize([(1,2,6), (4,5,9)])
> ardd.join(brdd).count()
> The last line will run forever (the problem in this code is that the RDD 
> entries have 3 values instead of the expected 2).  I haven't verified if this 
> is a problem for 1.0 as well as 0.9.
> Thanks to Shivaram for helping diagnose this issue!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to