[ 
https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened SPARK-6847:
------------------------------

I think it's worth reopening although I still don't know what the issue is, but 
it's not obviously a usage problem. Yes I meant turn down the batch interval 
too but I suppose you've covered that case. I don't recall hearing other issues 
like this, and what you're describing sounds like it would affect most any use 
of checkpointing, so that's surprising. Still could be some issue in here but 
you may have to lead debugging if you're seeking a resolution. I don't have 
bright ideas from here.

> Stack overflow on updateStateByKey which followed by a dstream with 
> checkpoint set
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-6847
>                 URL: https://issues.apache.org/jira/browse/SPARK-6847
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.3.0
>            Reporter: Jack Hu
>              Labels: StackOverflowError, Streaming
>
> The issue happens with the following sample code: uses {{updateStateByKey}} 
> followed by a {{map}} with checkpoint interval 10 seconds
> {code}
>     val sparkConf = new SparkConf().setAppName("test")
>     val streamingContext = new StreamingContext(sparkConf, Seconds(10))
>     streamingContext.checkpoint("""checkpoint""")
>     val source = streamingContext.socketTextStream("localhost", 9999)
>     val updatedResult = source.map(
>         (1,_)).updateStateByKey(
>             (newlist : Seq[String], oldstate : Option[String]) =>     
> newlist.headOption.orElse(oldstate))
>     updatedResult.map(_._2)
>     .checkpoint(Seconds(10))
>     .foreachRDD((rdd, t) => {
>       println("Deep: " + rdd.toDebugString.split("\n").length)
>       println(t.toString() + ": " + rdd.collect.length)
>     })
>     streamingContext.start()
>     streamingContext.awaitTermination()
> {code}
> From the output, we can see that the dependency will be increasing time over 
> time, the {{updateStateByKey}} never get check-pointed,  and finally, the 
> stack overflow will happen. 
> Note:
> * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but 
> not the {{updateStateByKey}} 
> * If remove the {{checkpoint(Seconds(10))}} from the map result ( 
> {{updatedResult.map(_._2)}} ), the stack overflow will not happen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to