Hi Arijit, my understanding is the following:
RDD actions will at some point call the runJob method of a SparkContext That runJob method calls the clean method which in turn calls ClosureCleaner.clean which removes unneeded stuff from closures and also checks whether they are serializable. The next step is to submit it to the DAGScheduler which in turn has a closureSerializer which is just an instance of a normal org.apache.spark.serializer.Serializer: val closureSerializer = instantiateClassFromConf[Serializer]("spark.closure.serializer", "org.apache.spark.serializer.JavaSerializer") This serializes the closure to a byte array which can then be shipped. I might be wrong here but I hope it helps :) Cheers, Lars On Wed, Oct 14, 2015 at 11:23 PM, Arijit <arij...@live.com> wrote: > Hi Xiao, > > Thank you very much for the pointers. I looked into the part of the > code. I now understand how the main method is invoked. Still not clear > how is the code distributed to the executors. Is it the whole jar or some > serialized object. I was expecting to see the part of the code where the > closures are serialized and shipped. Maybe I am missing something. > > Thanks again, Arijit > > ------------------------------ > Date: Thu, 8 Oct 2015 10:26:55 -0700 > Subject: Re: Understanding code/closure shipment to Spark workers > From: gatorsm...@gmail.com > To: arij...@live.com > CC: dev@spark.apache.org > > > Hi, Arijit, > > The code flow of spark-submit is simple. > > Enter the main function of SparkSubmit.scala > --> case SparkSubmitAction.SUBMIT => submit(appArgs) > --> doRunMain() in function submit() in the same file > --> runMain(childArgs,...) in the same file > --> mainMethod.invoke(null, childArgs.toArray) in the same file > > Function Invoke() is provided by JAVA Reflection for invoking the main > function of your JAR. > > Hopefully, it can help you understand the problem. > > Thanks, > > Xiao Li > > > 2015-10-07 16:47 GMT-07:00 Arijit <arij...@live.com>: > > Hi, > > I want to understand the code flow starting from the Spark jar that I > submit through spark-submit, how does Spark identify and extract the > closures, clean and serialize them and ship them to workers to execute as > tasks. Can someone point me to any documentation or a pointer to the source > code path to help me understand this. > > Thanks, Arijit > > > > >