[
https://issues.apache.org/jira/browse/SPARK-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060498#comment-14060498
]
Sean Owen commented on SPARK-2278:
----------------------------------
(Oh right, sortBy is not in 1.0.0, it came after:
https://github.com/apache/spark/commit/b92d16b114fd49e881d09e7974ad57b2a0df2906
)
You wouldn't need ordering for groupBy right? and ascending isn't needed with
custom comparators I think, although could be a convenience.
sortBy and sortByKey take an implicit Ordering parameter in Scala, which is the
analog of Comparator. They also let you supply a function that transforms the
elements or keys before sorting. This lets you sort on, say, a field of the
objects.
One gap I see is that the Java API can't expose the implicit Ordering
parameter? You can still supply the mapping function, but must use the natural
ordering of the mapped value to sort. That covers a lot of cases -- I can sort
by employee salary descending -- but not all.
I don't know that the answer is to add the different Comparator interface into
the mix. It seems more direct to try to expose Ordering if anything.
All of the above matches Scala's sortBy method, and I think that's an important
driver. See
http://www.scala-lang.org/api/2.10.3/index.html#scala.collection.immutable.List
It makes more sense if you have seen this API.
Same for groupBy and groupByKey. Rather than defining a custom equality
function, you map each value to a different one whose equality defines how to
group. For example, I can group on (employee => employee.age) rather than
define a custom Comparator on Employee that checks age.
> groupBy & groupByKey should support custom comparator
> -----------------------------------------------------
>
> Key: SPARK-2278
> URL: https://issues.apache.org/jira/browse/SPARK-2278
> Project: Spark
> Issue Type: New Feature
> Components: Java API
> Affects Versions: 1.0.0
> Reporter: Hans Uhlig
>
> To maintain parity with MapReduce you should be able to specify a custom key
> equality function in groupBy/groupByKey similar to sortByKey.
--
This message was sent by Atlassian JIRA
(v6.2#6252)