[
https://issues.apache.org/jira/browse/SPARK-27666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan resolved SPARK-27666.
---------------------------------
Resolution: Fixed
Fix Version/s: 3.0.0
Issue resolved by pull request 24699
[https://github.com/apache/spark/pull/24699]
> Do not release lock while TaskContext already completed
> -------------------------------------------------------
>
> Key: SPARK-27666
> URL: https://issues.apache.org/jira/browse/SPARK-27666
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Wenchen Fan
> Priority: Major
> Fix For: 3.0.0
>
>
> {code:java}
> Exception in thread "Thread-14" java.lang.AssertionError: assertion failed:
> Block rdd_0_0 is not locked for reading
> at scala.Predef$.assert(Predef.scala:223)
> at
> org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299)
> at
> org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:1000)
> at
> org.apache.spark.storage.BlockManager.$anonfun$getLocalValues$5(BlockManager.scala:746)
> at
> org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:47)
> at
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:36)
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
> at org.apache.spark.rdd.RDDSuite.$anonfun$new$265(RDDSuite.scala:1185)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> We're facing an issue reported by SPARK-18406 and SPARK-25139. And
> [https://github.com/apache/spark/pull/24542] bypassed the issue by capturing
> the assertion error to avoid failing the executor. However, when not using
> pyspark, issue still exists when user implements a custom
> RDD(https://issues.apache.org/jira/browse/SPARK-18406?focusedCommentId=15969384&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15969384)
> or task(see demo below), which spawn a separate thread to consume iterator
> from a cached parent RDD.
> {code:java}
> val rdd0 = sc.parallelize(Range(0, 10), 1).cache()
> rdd0.collect()
> rdd0.mapPartitions { iter =>
> val t = new Thread(new Runnable {
> override def run(): Unit = {
> while(iter.hasNext) {
> println(iter.next())
> Thread.sleep(1000)
> }
> }
> })
> t.setDaemon(false)
> t.start()
> Iterator(0)
> }.collect()
> {code}
> we could easily to reproduce the issue using the demo above.
> If we could prevent the separate thread from releasing lock on block when
> TaskContext has already completed,
> then, we won't hit this issue again.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]