Re: RDD replication in Spark

Cheng Lian Wed, 27 Aug 2014 14:08:46 -0700

You may start from here
<https://github.com/apache/spark/blob/4fa2fda88fc7beebb579ba808e400113b512533b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L706-L712>
.



On Mon, Aug 25, 2014 at 9:05 PM, rapelly kartheek <[email protected]>
wrote:

> Hi,
>
> I've exercised multiple options available for persist() including  RDD
> replication. I have gone thru the classes that involve in caching/storing
> the RDDS at different levels. StorageLevel class plays a pivotal role by
> recording whether to use memory or disk or to replicate the RDD on multiple
> nodes.
> The class LocationIterator iterates over the preferred machines one by
> one  for
> each partition that is replicated. I got a rough idea of CoalescedRDD.
> Please correct me if I am wrong.
>
> But I am looking for the code that chooses the resources to replicate the
> RDDs. Can someone please tell me how replication takes place and how do we
> choose the resources for replication. I just want to know as to where
> should I look into to understand how the replication happens.
>
>
>
> Thank you so much!!!
>
> regards
>
> -Karthik
>

Re: RDD replication in Spark

Reply via email to