[
https://issues.apache.org/jira/browse/SPARK-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060267#comment-14060267
]
Hans Uhlig commented on SPARK-2278:
-----------------------------------
So I just checked with the current 1.0.0 api and JavaPairRDD implements the
following. (There was no SortBy that I could find)
JavaPairRDD<K,V> JavaPairRDD.sortByKey()
JavaPairRDD<K,V> JavaPairRDD.sortByKey(Comparator comp)
JavaPairRDD<K,V> JavaPairRDD.sortByKey(boolean ascending)
JavaPairRDD<K,V> JavaPairRDD.sortByKey(Comparator comp, boolean ascending)
JavaPairRDD<K,V> JavaPairRDD.sortByKey(Comparator comp, boolean ascending, int
numPartitions)
JavaPairRDD<K,Iterable<T>> JavaRDD.groupBy( groupBy(Function<T,K> arg0) )
JavaPairRDD<K,Iterable<T>> JavaRDD.groupBy(
JavaPairRDD<K,Iterable<Tuple2<K,V>>> groupBy(Function<Tuple2<K,V>,K> f) )
JavaPairRDD<K,Iterable<T>> JavaRDD.groupBy( JavaPairRDD<K,Iterable<T>>
groupBy(Function<T,K> arg0, int arg1) )
JavaPairRDD<K,Iterable<T>> JavaRDD.groupBy(
JavaPairRDD<K,Iterable<Tuple2<K,V>>> groupBy(Function<Tuple2<K,V>,K> f, int
numPartitions) )
JavaPairRDD.groupByKey()
JavaPairRDD.groupByKey(Partitioner partitioner )
JavaPairRDD.groupByKey(int numPartitions )
The base non implied parameter functions should provide the following
interfaces for optimum control and flexibility:
JavaRDD<K,V> JavaRDD.sortBy(Comparator comp, boolean ascending, Partitioner
partitioner, int numPartitions)
JavaPairRDD<K,V> JavaPairRDD.sortByKey(Comparator comp, boolean ascending,
Partitioner partitioner, int numPartitions)
JavaRDD<K,Iterable<T>> JavaRDD.groupBy(JavaPairRDD<K,Iterable<T>>
groupBy(Function<T,K> func()), Comparator comp, boolean ascending, Partitioner
partitioner, int numPartitions)
JavaPairRDD<K,Iterable<V>> JavaPairRDD.groupByKey( JavaPairRDD<K,Iterable<T>>
groupBy(Function<T,K> func), Comparator comp, boolean ascending, Partitioner
partitioner, int numPartitions)
GroupByKey's function Reference should look something like "Iterable<O>
Function<K,V,O> (K key, Iterable<V> values)"
Unless there is a different function to do that particular job that I am
missing. The lack of descriptions for what the inputs and outputs of the
function references should do make that a bit difficult to discern sometimes.
> groupBy & groupByKey should support custom comparator
> -----------------------------------------------------
>
> Key: SPARK-2278
> URL: https://issues.apache.org/jira/browse/SPARK-2278
> Project: Spark
> Issue Type: New Feature
> Components: Java API
> Affects Versions: 1.0.0
> Reporter: Hans Uhlig
>
> To maintain parity with MapReduce you should be able to specify a custom key
> equality function in groupBy/groupByKey similar to sortByKey.
--
This message was sent by Atlassian JIRA
(v6.2#6252)