how about generating the key using some 1-way hashing like md5? On Thu, Jun 18, 2015 at 9:59 PM, Guillaume Pitel <guillaume.pi...@exensa.com > wrote:
> > I think you can randomly reshuffle your elements just by emitting a random > key (mapping a PairRdd's key triggers a reshuffle IIRC) > > yourrdd.map{ x => (rand(), x)} > > There is obiously a risk that rand() will give same sequence of numbers in > each partition, so you may need to use mapPartitionsWithIndex first and > seed your rand with the partition id (or compute your rand from a seed > based on x). > > Guillaume > > Hello, > > In the context of a machine learning algorithm, I need to be able to > randomly distribute the elements of a large RDD across partitions (i.e., > essentially assign each element to a random partition). How could I achieve > this? I have tried to call repartition() with the current number of > partitions - but it seems to me that this moves only some of the elements, > and in a deterministic way. > > I know this will be an expensive operation but I only need to perform it > every once in a while. > > Thanks a lot! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-randomly-distribute-elements-tp23391.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > -- > [image: eXenSa] > *Guillaume PITEL, Président* > +33(0)626 222 431 > > eXenSa S.A.S. <http://www.exensa.com/> > 41, rue Périer - 92120 Montrouge - FRANCE > Tel +33(0)184 163 677 / Fax +33(0)972 283 705 > -- Best Regards, Ayan Guha