Declare this class with "extends Serializable", meaning java.io.Serializable?
On Thu, Jul 3, 2014 at 12:24 PM, Ulanov, Alexander <alexander.ula...@hp.com> wrote: > Hi, > > I wonder how I can pass parameters to RDD functions with closures. If I do it > in a following way, Spark crashes with NotSerializableException: > > class TextToWordVector(csvData:RDD[Array[String]]) { > > val n = 1 > lazy val x = csvData.map{ stringArr => stringArr(n)}.collect() > } > > Exception: > Job aborted due to stage failure: Task not serializable: > java.io.NotSerializableException: org.apache.spark.mllib.util.TextToWordVector > org.apache.spark.SparkException: Job aborted due to stage failure: Task not > serializable: java.io.NotSerializableException: > org.apache.spark.mllib.util.TextToWordVector > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1038) > > > This message proposes a workaround, but it didn't work for me: > http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3CCAA_qdLrxXzwXd5=6SXLOgSmTTorpOADHjnOXn=tMrOLEJM=f...@mail.gmail.com%3E > > What is the best practice? > > Best regards, Alexander