Re: Random pairs / RDD order

2015-04-19 Thread Aurélien Bellet
shuffle the random sample so far. Thanks a lot! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Random-pairs-RDD-order-tp22529.html Sent from the Apache Spark User List mailing list archive

Re: Random pairs / RDD order

2015-04-17 Thread Aurélien Bellet
not find a way to efficiently shuffle the random sample so far. Thanks a lot! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Random-pairs-RDD-order-tp22529.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Random pairs / RDD order

2015-04-16 Thread abellet
.1001560.n3.nabble.com/Random-pairs-RDD-order-tp22529.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Random pairs / RDD order

2015-04-16 Thread Guillaume Pitel
Hi Aurelien, Sean's solution is nice, but maybe not completely order-free, since pairs will come from the same partition. The easiest / fastest way to do it in my opinion is to use a random key instead of a zipWithIndex. Of course you'll not be able to ensure uniqueness of each elements of

Re: Random pairs / RDD order

2015-04-16 Thread Sean Owen
(Indeed, though the OP said it was a requirement that the pairs are drawn from the same partition.) On Thu, Apr 16, 2015 at 11:14 PM, Guillaume Pitel guillaume.pi...@exensa.com wrote: Hi Aurelien, Sean's solution is nice, but maybe not completely order-free, since pairs will come from the

Re: Random pairs / RDD order

2015-04-16 Thread Sean Owen
this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Random-pairs-RDD-order-tp22529.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr