[
https://issues.apache.org/jira/browse/SPARK-27666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wuyi updated SPARK-27666:
-------------------------
Description:
{code:java}
Exception in thread "Thread-14" java.lang.AssertionError: assertion failed:
Block rdd_0_0 is not locked for reading
at scala.Predef$.assert(Predef.scala:223)
at
org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299)
at
org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:1000)
at
org.apache.spark.storage.BlockManager.$anonfun$getLocalValues$5(BlockManager.scala:746)
at
org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:47)
at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:36)
at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.spark.rdd.RDDSuite.$anonfun$new$265(RDDSuite.scala:1185)
at java.lang.Thread.run(Thread.java:748)
{code}
We're facing an issue reported by SPARK-18406 and SPARK-25139. And
[https://github.com/apache/spark/pull/24542] bypassed the issue by capturing
the assertion error to avoid failing the executor. However, when not using
pyspark, issue still exists when user implements a custom
RDD(https://issues.apache.org/jira/browse/SPARK-18406?focusedCommentId=15969384&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15969384)
or task(see demo below), which spawn a separate thread to consume iterator
from a cached parent RDD.
{code:java}
val rdd0 = sc.parallelize(Range(0, 10), 1).cache()
rdd0.collect()
rdd0.mapPartitions { iter =>
val t = new Thread(new Runnable {
override def run(): Unit = {
while(iter.hasNext) {
println(iter.next())
Thread.sleep(1000)
}
}
})
t.setDaemon(false)
t.start()
Iterator(0)
}.collect()
{code}
we could easily to reproduce the issue using the demo above.
If we could prevent the separate thread from releasing lock on block when
TaskContext has already completed,
then, we won't hit this issue again.
was:
We're facing an issue reported by SPARK-18406 and SPARK-25139. And
[https://github.com/apache/spark/pull/24542] bypassed the issue by capturing
the assertion error to avoid failing the executor. However, when not using
pyspark, issue still exists when user implements a custom
RDD(https://issues.apache.org/jira/browse/SPARK-18406?focusedCommentId=15969384&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15969384)
or task(see demo below), which spawn a separate thread to consume iterator
from a cached parent RDD.
{code:java}
val rdd0 = sc.parallelize(Range(0, 10), 1).cache()
rdd0.collect()
rdd0.mapPartitions { iter =>
val t = new Thread(new Runnable {
override def run(): Unit = {
while(iter.hasNext) {
println(iter.next())
Thread.sleep(1000)
}
}
})
t.setDaemon(false)
t.start()
Iterator(0)
}.collect()
{code}
we could easily to reproduce the issue using the demo above.
If we could prevent the separate thread from releasing lock on block when
TaskContext has already completed,
then, we won't hit this issue again.
> Do not release lock while TaskContext already completed
> -------------------------------------------------------
>
> Key: SPARK-27666
> URL: https://issues.apache.org/jira/browse/SPARK-27666
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Wenchen Fan
> Priority: Major
>
> {code:java}
> Exception in thread "Thread-14" java.lang.AssertionError: assertion failed:
> Block rdd_0_0 is not locked for reading
> at scala.Predef$.assert(Predef.scala:223)
> at
> org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299)
> at
> org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:1000)
> at
> org.apache.spark.storage.BlockManager.$anonfun$getLocalValues$5(BlockManager.scala:746)
> at
> org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:47)
> at
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:36)
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
> at org.apache.spark.rdd.RDDSuite.$anonfun$new$265(RDDSuite.scala:1185)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> We're facing an issue reported by SPARK-18406 and SPARK-25139. And
> [https://github.com/apache/spark/pull/24542] bypassed the issue by capturing
> the assertion error to avoid failing the executor. However, when not using
> pyspark, issue still exists when user implements a custom
> RDD(https://issues.apache.org/jira/browse/SPARK-18406?focusedCommentId=15969384&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15969384)
> or task(see demo below), which spawn a separate thread to consume iterator
> from a cached parent RDD.
> {code:java}
> val rdd0 = sc.parallelize(Range(0, 10), 1).cache()
> rdd0.collect()
> rdd0.mapPartitions { iter =>
> val t = new Thread(new Runnable {
> override def run(): Unit = {
> while(iter.hasNext) {
> println(iter.next())
> Thread.sleep(1000)
> }
> }
> })
> t.setDaemon(false)
> t.start()
> Iterator(0)
> }.collect()
> {code}
> we could easily to reproduce the issue using the demo above.
> If we could prevent the separate thread from releasing lock on block when
> TaskContext has already completed,
> then, we won't hit this issue again.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]