It is kind of unexpected, i can imagine a real scenario under which it should trigger. But obviously I am missing something :)
TD On Tue, Apr 21, 2015 at 4:23 PM, Jean-Pascal Billaud <j...@tellapart.com> wrote: > Sure. But in general, I am assuming this ""Graph is unexpectedly null > when DStream is being serialized" must mean something. Under which > circumstances, such an exception would trigger? > > On Tue, Apr 21, 2015 at 4:12 PM, Tathagata Das <t...@databricks.com> > wrote: > >> Yeah, I am not sure what is going on. The only way to figure to take a >> look at the disassembled bytecodes using javap. >> >> TD >> >> On Tue, Apr 21, 2015 at 1:53 PM, Jean-Pascal Billaud <j...@tellapart.com> >> wrote: >> >>> At this point I am assuming that nobody has an idea... I am still going >>> to give it a last shot just in case it was missed by some people :) >>> >>> Thanks, >>> >>> On Mon, Apr 20, 2015 at 2:20 PM, Jean-Pascal Billaud <j...@tellapart.com> >>> wrote: >>> >>>> Hey, so I start the context at the very end when all the piping is >>>> done. BTW a foreachRDD will be called on the resulting dstream.map() right >>>> after that. >>>> >>>> The puzzling thing is why removing the context bounds solve the >>>> problem... What does this exception mean in general? >>>> >>>> On Mon, Apr 20, 2015 at 1:33 PM, Tathagata Das <t...@databricks.com> >>>> wrote: >>>> >>>>> When are you getting this exception? After starting the context? >>>>> >>>>> TD >>>>> >>>>> On Mon, Apr 20, 2015 at 10:44 AM, Jean-Pascal Billaud < >>>>> j...@tellapart.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am getting this serialization exception and I am not too sure what >>>>>> "Graph is unexpectedly null when DStream is being serialized" means? >>>>>> >>>>>> 15/04/20 06:12:38 INFO yarn.ApplicationMaster: Final app status: >>>>>> FAILED, exitCode: 15, (reason: User class threw exception: Task not >>>>>> serializable) >>>>>> Exception in thread "Driver" org.apache.spark.SparkException: Task >>>>>> not serializable >>>>>> at org.apache.spark.util.ClosureCleaner$.ensureSerializable( >>>>>> ClosureCleaner.scala:166) >>>>>> at org.apache.spark.util.ClosureCleaner$.clean( >>>>>> ClosureCleaner.scala:158) >>>>>> at org.apache.spark.SparkContext. >>>>>> clean(SparkContext.scala:1435) >>>>>> at org.apache.spark.streaming.dstream.DStream.map(DStream. >>>>>> scala:438) >>>>>> [...] >>>>>> Caused by: java.io.NotSerializableException: Graph is unexpectedly >>>>>> null when DStream is being serialized. >>>>>> at org.apache.spark.streaming.dstream.DStream$anonfun$ >>>>>> writeObject$1.apply$mcV$sp(DStream.scala:420) >>>>>> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala: >>>>>> 985) >>>>>> at org.apache.spark.streaming.dstream.DStream.writeObject( >>>>>> DStream.scala:403) >>>>>> >>>>>> The operation comes down to something like this: >>>>>> >>>>>> dstream.map(tuple => { >>>>>> val w = StreamState.fetch[K,W](state.prefixKey, tuple._1) >>>>>> (tuple._1, (tuple._2, w)) }) >>>>>> >>>>>> And StreamState being a very simple standalone object: >>>>>> >>>>>> object StreamState { >>>>>> def fetch[K : ClassTag : Ordering, V : ClassTag](prefixKey: String, >>>>>> key: K) : Option[V] = None >>>>>> } >>>>>> >>>>>> However if I remove the context bounds from K in fetch e.g. removing >>>>>> ClassTag and Ordering then everything is fine. >>>>>> >>>>>> If anyone has some pointers, I'd really appreciate it. >>>>>> >>>>>> Thanks, >>>>>> >>>>> >>>>> >>>> >>> >> >