Nick Pritchard created SPARK-5934:
-------------------------------------
Summary: DStreamGraph.clearMetadata attempts to unpersist the same
RDD multiple times
Key: SPARK-5934
URL: https://issues.apache.org/jira/browse/SPARK-5934
Project: Spark
Issue Type: Bug
Components: Block Manager, Streaming
Affects Versions: 1.2.1
Reporter: Nick Pritchard
Priority: Minor
It seems that since DStream.clearMetadata calls itself recursively on the
dependencies, that it attempts to unpersist the same RDD, which results in warn
logs like this:
{quote}
WARN BlockManager: Asked to remove block rdd_2_1, which does not exist
{quote}
or this:
{quote}
WARN BlockManager: Block rdd_2_1 could not be removed as it was not found in
either the disk, memory, or tachyon store
{quote}
This is preceded by logs like:
{quote}
DEBUG TransformedDStream: Unpersisting old RDDs: 2
DEBUG QueueInputDStream: Unpersisting old RDDs: 2
{quote}
Here is a reproducible case:
{code:scala}
object Test {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[2]").setAppName("Test")
val ssc = new StreamingContext(conf, Seconds(1))
val queue = new mutable.Queue[RDD[Int]]
val input = ssc.queueStream(queue)
val output = input.cache().transform(x => x)
output.print()
ssc.start()
for (i <- 1 to 5) {
val rdd = ssc.sparkContext.parallelize(Seq(i))
queue.enqueue(rdd)
Thread.sleep(1000)
}
ssc.stop()
}
}
{code}
It doesn't seem to be a fatal error, but the WARN messages are a bit unsettling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]