See the scaladoc from OrderedRDDFunctions.scala :
* Sort the RDD by key, so that each partition contains a sorted range of
the elements. Calling
* `collect` or `save` on the resulting RDD will return or output an
ordered list of records
* (in the `save` case, they will be written to multiple `part-X` files
in the filesystem, in
* order of the keys).
Cheers
On Wed, Apr 8, 2015 at 3:01 PM, Tom thubregt...@gmail.com wrote:
Hi,
If I perform a sortByKey(true, 2).saveAsTextFile(filename) on a cluster,
will the data be sorted per partition, or in total. (And is this
guaranteed?)
Example:
Input 4,2,3,6,5,7
Sorted per partition:
part-0: 2,3,7
part-1: 4,5,6
Sorted in total:
part-0: 2,3,4
part-1: 5,6,7
Thanks,
Tom
P.S. (I know that the data might not end up being uniformly distributed,
example: 4 elements in part-0 and 2 in part-1)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/sortByKey-with-multiple-partitions-tp22426.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org