Maybe I can help a bit. What happens when you call .map(my func) is
that you create a MapPartitionsRDD that has a reference to that
closure in it's compute() function. When a job is run (jobs are run as
the result of RDD actions):

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L520

When this happens the RDD will generate ShuffleMapTask's for
physically computing the MapPartitionsRDD. The ShuffleMapTask will be
shipped to the executor and then run()

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala#L70

The ShuffleMapTask will call rdd.iterator() which will eventually call
into compute()

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L240

- Patrick

On Fri, May 1, 2015 at 2:06 PM, Tom Hubregtsen <thubregt...@gmail.com> wrote:
> I am trying to understand what the data and computation flow is in Spark, and
> believe I fairly understand the Shuffle (both map and reduce side), but I do
> not get what happens to the computation from the map stages. I know all maps
> gets pipelined on the shuffle (when there is no other action in between),
> but I can not find where the actual computation for the map happens (for
> instance for rdd.map(x => x+1), where does the +1 happen?). Any pointers to
> files or functions are appreciated.
>
> I know compute of rdd/MapPartitionsRDD.scala gets called, but I loose track
> of the lambda function after this.
>
> Thanks,
>
> Tom
>
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-the-location-in-the-source-code-of-the-computation-of-the-elements-in-a-map-transformation-tp11971.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to