Either Serializable works, scala Serializable extends Java's (originally intended a common interface for people who didn't want to run Scala on a JVM).
Class fields require the class be serialized along with the object to access. If you declared "val n" inside a method's scope instead, though, we wouldn't need the class. E.g.: class TextToWordVector(csvData:RDD[Array[String]]) { def computeX() = { val n = 1 csvData.map{ stringArr => stringArr(n)}.collect() } lazy val x = computeX() } Note that if the class itself doesn't actually contain many (large) fields, though, it may not be an issue to actually transfer it around. On Thu, Jul 3, 2014 at 5:21 AM, Ulanov, Alexander <alexander.ula...@hp.com> wrote: > Thanks, this works both with Scala and Java Serializable. Which one should > I use? > > Related question: I would like only the particular val to be used instead > of the whole class, what should I do? > As far as I understand, the whole class is serialized and transferred > between nodes (am I right?) > > Alexander > > -----Original Message----- > From: Sean Owen [mailto:so...@cloudera.com] > Sent: Thursday, July 03, 2014 3:31 PM > To: dev@spark.apache.org > Subject: Re: Pass parameters to RDD functions > > Declare this class with "extends Serializable", meaning > java.io.Serializable? > > On Thu, Jul 3, 2014 at 12:24 PM, Ulanov, Alexander < > alexander.ula...@hp.com> wrote: > > Hi, > > > > I wonder how I can pass parameters to RDD functions with closures. If I > do it in a following way, Spark crashes with NotSerializableException: > > > > class TextToWordVector(csvData:RDD[Array[String]]) { > > > > val n = 1 > > lazy val x = csvData.map{ stringArr => stringArr(n)}.collect() } > > > > Exception: > > Job aborted due to stage failure: Task not serializable: > > java.io.NotSerializableException: > > org.apache.spark.mllib.util.TextToWordVector > > org.apache.spark.SparkException: Job aborted due to stage failure: Task > not serializable: java.io.NotSerializableException: > org.apache.spark.mllib.util.TextToWordVector > > at > > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAG > > Scheduler$$failJobAndIndependentStages(DAGScheduler.scala:1038) > > > > > > This message proposes a workaround, but it didn't work for me: > > http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3CCAA > > _qdLrxXzwXd5=6SXLOgSmTTorpOADHjnOXn=tMrOLEJM=f...@mail.gmail.com%3E > > > > What is the best practice? > > > > Best regards, Alexander >