[
https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen reopened SPARK-6847:
------------------------------
I think it's worth reopening although I still don't know what the issue is, but
it's not obviously a usage problem. Yes I meant turn down the batch interval
too but I suppose you've covered that case. I don't recall hearing other issues
like this, and what you're describing sounds like it would affect most any use
of checkpointing, so that's surprising. Still could be some issue in here but
you may have to lead debugging if you're seeking a resolution. I don't have
bright ideas from here.
> Stack overflow on updateStateByKey which followed by a dstream with
> checkpoint set
> ----------------------------------------------------------------------------------
>
> Key: SPARK-6847
> URL: https://issues.apache.org/jira/browse/SPARK-6847
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 1.3.0
> Reporter: Jack Hu
> Labels: StackOverflowError, Streaming
>
> The issue happens with the following sample code: uses {{updateStateByKey}}
> followed by a {{map}} with checkpoint interval 10 seconds
> {code}
> val sparkConf = new SparkConf().setAppName("test")
> val streamingContext = new StreamingContext(sparkConf, Seconds(10))
> streamingContext.checkpoint("""checkpoint""")
> val source = streamingContext.socketTextStream("localhost", 9999)
> val updatedResult = source.map(
> (1,_)).updateStateByKey(
> (newlist : Seq[String], oldstate : Option[String]) =>
> newlist.headOption.orElse(oldstate))
> updatedResult.map(_._2)
> .checkpoint(Seconds(10))
> .foreachRDD((rdd, t) => {
> println("Deep: " + rdd.toDebugString.split("\n").length)
> println(t.toString() + ": " + rdd.collect.length)
> })
> streamingContext.start()
> streamingContext.awaitTermination()
> {code}
> From the output, we can see that the dependency will be increasing time over
> time, the {{updateStateByKey}} never get check-pointed, and finally, the
> stack overflow will happen.
> Note:
> * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but
> not the {{updateStateByKey}}
> * If remove the {{checkpoint(Seconds(10))}} from the map result (
> {{updatedResult.map(_._2)}} ), the stack overflow will not happen
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]