[
https://issues.apache.org/jira/browse/SPARK-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Or updated SPARK-6132:
-----------------------------
Description:
The context cleaner thread is not stopped properly. If a SparkContext is
started immediately after one stops, the context cleaner of the former can
clean variables in the latter.
This is because the cleaner.stop() just sets a flag and expects the thread to
terminate asynchronously, but the code to clean broadcasts goes through
`SparkEnv.get.blockManager`, which could belong to a different SparkContext.
This is likely to be the cause of the `JavaAPISuite`, which creates many
back-to-back SparkContexts, being flaky:
{code}
java.io.IOException: org.apache.spark.SparkException: Failed to get
broadcast_0_piece0 of broadcast_0
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
...
Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of
broadcast_0
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at scala.Option.getOrElse(Option.scala:120)
{code}
The right behavior is to wait until all currently running clean up tasks have
finished.
was:
The context cleaner thread is not stopped properly. If a SparkContext is
started immediately after one stops, the context cleaner of the former can
clean variables in the latter.
This is because the cleaner.stop() just sets a flag and expects the thread to
terminate asynchronously, but the code to clean broadcasts goes through
`SparkEnv.get.blockManager`, which could belong to a different SparkContext.
This is likely to be the cause of the `JavaAPISuite`, which creates many
back-to-back SparkContexts, being flaky:
{code}
java.io.IOException: org.apache.spark.SparkException: Failed to get
broadcast_0_piece0 of broadcast_0
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0 of
broadcast_0
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at scala.Option.getOrElse(Option.scala:120)
{code}
The right behavior is to wait until all currently running clean up tasks have
finished.
> Context cleaner thread lives across SparkContexts
> -------------------------------------------------
>
> Key: SPARK-6132
> URL: https://issues.apache.org/jira/browse/SPARK-6132
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.3.0
> Reporter: Andrew Or
> Assignee: Andrew Or
>
> The context cleaner thread is not stopped properly. If a SparkContext is
> started immediately after one stops, the context cleaner of the former can
> clean variables in the latter.
> This is because the cleaner.stop() just sets a flag and expects the thread to
> terminate asynchronously, but the code to clean broadcasts goes through
> `SparkEnv.get.blockManager`, which could belong to a different SparkContext.
> This is likely to be the cause of the `JavaAPISuite`, which creates many
> back-to-back SparkContexts, being flaky:
> {code}
> java.io.IOException: org.apache.spark.SparkException: Failed to get
> broadcast_0_piece0 of broadcast_0
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1180)
> at
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> ...
> Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0
> of broadcast_0
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
> at scala.Option.getOrElse(Option.scala:120)
> {code}
> The right behavior is to wait until all currently running clean up tasks have
> finished.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]