You can create lazily instantiated singleton instances. See http://spark.apache.org/docs/latest/streaming-programming-guide.html#accumulators-broadcast-variables-and-checkpoints for examples of accumulators and broadcast variables. You can use the same approach to create your cached RDD.
On Tue, Feb 7, 2017 at 10:45 AM, shyla deshpande <deshpandesh...@gmail.com> wrote: > and my cached RDD is not small. If it was maybe I could materialize and > broadcast. > > Thanks > > On Tue, Feb 7, 2017 at 10:28 AM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> I have a situation similar to the following and I get SPARK-13758 >> <https://issues.apache.org/jira/browse/SPARK-13758>. >> >> >> I understand why I get this error, but I want to know what should be the >> approach in dealing with these situations. >> >> >> Thanks >> >> >> > var cached = ssc.sparkContext.parallelize(Seq("apple, banana")) >> > val words = ssc.socketTextStream(ip, port).flatMap(_.split(" ")) >> > words.foreachRDD((rdd: RDD[String]) => { >> > val res = rdd.map(word => (word, word.length)).collect() >> > println("words: " + res.mkString(", ")) >> > cached = cached.union(rdd) >> > cached.checkpoint() >> > println("cached words: " + cached.collect.mkString(", ")) >> > }) >> >> >