[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

koert kuipers (JIRA) Fri, 19 Dec 2014 16:18:25 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254337#comment-14254337
 ]


koert kuipers commented on SPARK-3655:
--------------------------------------

Imran,
Thanks for taking the time to write this down!

Just to be clear:
val x = RDD[X]
x.groupAndSort(f1, f2) where f1 is X => K and f2 is X => V would produce a
SortedRDD[K, V]?

SortedRDD makes me think of OrderedRDD. The RDD you describe is partitioned
by key and sorted by (key, value). SecondarySortedRDD? Not nice either...

An implementation of what you suggest could be done pretty quickly with the
code in the current pullreq. It's an existing building block in the code
already somewhat.

Curious to hear what others think.


On Fri, Dec 19, 2014 at 5:15 PM, Imran Rashid (JIRA) <[email protected]>


> Support sorting of values in addition to keys (i.e. secondary sort)
> -------------------------------------------------------------------
>
>                 Key: SPARK-3655
>                 URL: https://issues.apache.org/jira/browse/SPARK-3655
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 1.1.0, 1.2.0
>            Reporter: koert kuipers
>            Assignee: Koert Kuipers
>            Priority: Minor
>
> Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
> There are some use cases where getting a sorted iterator of values per key is 
> helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

Reply via email to