Hello Dachuan, RDDs generated by StateDStream are checkpointed because the tree of RDD dependencies (i.e. the RDD lineage) can grow indefinitely as each state RDD depends on the state RDD from the previous batch of data. Checkpointing save an RDD to HDFS to cuts of all ties to its parent RDDs (i.e. truncates the lineage). If you do not periodically checkpoint of the state RDDs, these really large lineages can lead to all sorts of problems. The "mustCheckpoint" field ensures that state RDDs are automatically checkpointed with some periodicity even if the user does not explicitly specify one. Setting mustCheckpoint to false disables this automatic checkpointing. I think that is leading to really large lineages, and serializing the RDD with its lineage is causing the stack to overflow.
On that note, what are you trying to achieve by setting mustCheckpoint = false? Maybe there is another way of achieving what you are trying to achieve. TD On Tue, Dec 24, 2013 at 9:05 AM, Dachuan Huang <huan...@cse.ohio-state.edu>wrote: > Hello, developers, > > Just out of curiosity, I have changed the "mustCheckpoint" in > StateDStream.scala to "false" by default. And run the > StatefulNetworkWordCount.scala example. > > My input is a 3MB/s speed Serversocket. > > It reports the following error after some time, the exception trace didn't > say anything about the spark code, so I don't know how to nail down the > root cause, can anybody help me with this? thanks. > > Exception in thread "DAGScheduler" java.lang.StackOverflowError > at > java.io.ObjectStreamClass.getPrimFieldValues(ObjectStreamClass.java:1233) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1532) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) > at scala.collection.immutable.$colon$colon.writeObject(List.scala:430) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495) > at > > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at >