[jira] [Commented] (SPARK-2278) groupBy & groupByKey should support custom comparator

Sean Owen (JIRA) Mon, 14 Jul 2014 03:02:33 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060498#comment-14060498
 ]


Sean Owen commented on SPARK-2278:
----------------------------------

(Oh right, sortBy is not in 1.0.0, it came after: 
https://github.com/apache/spark/commit/b92d16b114fd49e881d09e7974ad57b2a0df2906 
)

You wouldn't need ordering for groupBy right? and ascending isn't needed with 
custom comparators I think, although could be a convenience.

sortBy and sortByKey take an implicit Ordering parameter in Scala, which is the 
analog of Comparator. They also let you supply a function that transforms the 
elements or keys before sorting. This lets you sort on, say, a field of the 
objects.

One gap I see is that the Java API can't expose the implicit Ordering 
parameter? You can still supply the mapping function, but must use the natural 
ordering of the mapped value to sort. That covers a lot of cases -- I can sort 
by employee salary descending -- but not all.

I don't know that the answer is to add the different Comparator interface into 
the mix. It seems more direct to try to expose Ordering if anything.

All of the above matches Scala's sortBy method, and I think that's an important 
driver. See 
http://www.scala-lang.org/api/2.10.3/index.html#scala.collection.immutable.List
It makes more sense if you have seen this API.

Same for groupBy and groupByKey. Rather than defining a custom equality 
function, you map each value to a different one whose equality defines how to 
group. For example, I can group on (employee => employee.age) rather than 
define a custom Comparator on Employee that checks age.

> groupBy & groupByKey should support custom comparator
> -----------------------------------------------------
>
>                 Key: SPARK-2278
>                 URL: https://issues.apache.org/jira/browse/SPARK-2278
>             Project: Spark
>          Issue Type: New Feature
>          Components: Java API
>    Affects Versions: 1.0.0
>            Reporter: Hans Uhlig
>
> To maintain parity with MapReduce you should be able to specify a custom key 
> equality function in groupBy/groupByKey similar to sortByKey. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2278) groupBy & groupByKey should support custom comparator

Reply via email to