how about generating the key using some 1-way hashing like md5?

On Thu, Jun 18, 2015 at 9:59 PM, Guillaume Pitel <guillaume.pi...@exensa.com
> wrote:

>
> I think you can randomly reshuffle your elements just by emitting a random
> key (mapping a PairRdd's key triggers a reshuffle IIRC)
>
> yourrdd.map{ x => (rand(), x)}
>
> There is obiously a risk that rand() will give same sequence of numbers in
> each partition, so you may need to use mapPartitionsWithIndex first and
> seed your rand with the partition id (or compute your rand from a seed
> based on x).
>
> Guillaume
>
> Hello,
>
> In the context of a machine learning algorithm, I need to be able to
> randomly distribute the elements of a large RDD across partitions (i.e.,
> essentially assign each element to a random partition). How could I achieve
> this? I have tried to call repartition() with the current number of
> partitions - but it seems to me that this moves only some of the elements,
> and in a deterministic way.
>
> I know this will be an expensive operation but I only need to perform it
> every once in a while.
>
> Thanks a lot!
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-randomly-distribute-elements-tp23391.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
> --
>    [image: eXenSa]
>  *Guillaume PITEL, Président*
> +33(0)626 222 431
>
> eXenSa S.A.S. <http://www.exensa.com/>
>  41, rue Périer - 92120 Montrouge - FRANCE
> Tel +33(0)184 163 677 / Fax +33(0)972 283 705
>



-- 
Best Regards,
Ayan Guha

Reply via email to