sortByKey with multiple partitions

2015-04-08 Thread Tom
Hi,

If I perform a sortByKey(true, 2).saveAsTextFile(filename) on a cluster,
will the data be sorted per partition, or in total. (And is this
guaranteed?)

Example:
Input 4,2,3,6,5,7

Sorted per partition:
part-0: 2,3,7
part-1: 4,5,6

Sorted in total:
part-0: 2,3,4 
part-1: 5,6,7

Thanks,

Tom

P.S. (I know that the data might not end up being uniformly distributed,
example: 4 elements in part-0 and 2 in part-1)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/sortByKey-with-multiple-partitions-tp22426.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: sortByKey with multiple partitions

2015-04-08 Thread Ted Yu
See the scaladoc from OrderedRDDFunctions.scala :

   * Sort the RDD by key, so that each partition contains a sorted range of
the elements. Calling
   * `collect` or `save` on the resulting RDD will return or output an
ordered list of records
   * (in the `save` case, they will be written to multiple `part-X` files
in the filesystem, in
   * order of the keys).

Cheers

On Wed, Apr 8, 2015 at 3:01 PM, Tom thubregt...@gmail.com wrote:

 Hi,

 If I perform a sortByKey(true, 2).saveAsTextFile(filename) on a cluster,
 will the data be sorted per partition, or in total. (And is this
 guaranteed?)

 Example:
 Input 4,2,3,6,5,7

 Sorted per partition:
 part-0: 2,3,7
 part-1: 4,5,6

 Sorted in total:
 part-0: 2,3,4
 part-1: 5,6,7

 Thanks,

 Tom

 P.S. (I know that the data might not end up being uniformly distributed,
 example: 4 elements in part-0 and 2 in part-1)



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/sortByKey-with-multiple-partitions-tp22426.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org