[ 
https://issues.apache.org/jira/browse/SPARK-16599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097302#comment-16097302
 ] 

Ryan Williams commented on SPARK-16599:
---------------------------------------

Update: I think the real story here is the first stack trace above: {{Failed to 
get broadcast_0_piece0 of broadcast_0}}; in my real application, I see that 
error causing jobs to fail without the {{None.get}} occurring.

Curiously, setting {{log4j.logger.org.apache.spark=DEBUG}} causes the error to 
go away in my real application; setting {{log4j.logger.org.apache.spark=DEBUG}} 
in my repro causes the {{None.get}} to go away, but the job still fails due to 
{{Failed to get broadcast_0_piece0 of broadcast_0}}.

[Here is full output of such a repro run with debug logging 
enabled|https://gist.github.com/ryan-williams/564069ba1bd2c052c64be68f7d86c0c5].

The error seems sensitive to the order of creation of:

1. {{Broadcast}} from first {{SparkContext}}
2. Second {{SparkContext}}

The bug occurs when the {{Broadcast}} comes first:

{code}
    // Make a Broadcast
    val bs = sc.broadcast(Set(1, 2, 3))

    // "Accidentally" create second SparkContext
    println(Foo.foo)
{code}

but not when the order is reversed:

{code}
    // "Accidentally" create second SparkContext
    println(Foo.foo)

    // Make a Broadcast
    val bs = sc.broadcast(Set(1, 2, 3))
{code}

This also raises a question about whether everyone in this thread is seeing the 
same issue/exception, since my {{None.get}} seems to be caused by some other 
issue regarding block-management in the presence of multiple {{SparkContext}}'s.

> java.util.NoSuchElementException: None.get  at at 
> org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-16599
>                 URL: https://issues.apache.org/jira/browse/SPARK-16599
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>         Environment: centos 6.7   spark 2.0
>            Reporter: binde
>
> run a spark job with spark 2.0, error message
> Job aborted due to stage failure: Task 0 in stage 821.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 821.0 (TID 1480, e103): 
> java.util.NoSuchElementException: None.get
>       at scala.None$.get(Option.scala:347)
>       at scala.None$.get(Option.scala:345)
>       at 
> org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:343)
>       at 
> org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:644)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to