[GitHub] flink pull request #1856: FLINK-3650 Add maxBy/minBy to Scala DataSet API

ramkrish86 Wed, 15 Jun 2016 11:22:04 -0700

Github user ramkrish86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1856#discussion_r67216772
  
    --- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/DataSet.scala ---
    @@ -699,6 +700,55 @@ class DataSet[T: ClassTag](set: JavaDataSet[T]) {
       }
     
       /**
    +    * Selects an element with minimum value.
    +    *
    +    * The minimum is computed over the specified fields in lexicographical 
order.
    +    *
    +    * Example 1: Given a data set with elements [0, 1], [1, 0], the
    +    * results will be:
    +    *
    +    * minBy(0)[0, 1]
    +    * minBy(1)[1, 0]
    +    * Example 2: Given a data set with elements [0, 0], [0, 1], the
    +    * results will be:
    +    * minBy(0, 1)[0, 0]
    +    * If multiple values with minimum value at the specified fields exist, 
a random one will be
    +    * picked.
    +    * Internally, this operation is implemented as a {@link 
ReduceFunction}.
    +    */
    +  def minBy(fields: Int*) : DataSet[T]  = {
    +    if (!getType.isTupleType) {
    +      throw new InvalidProgramException("DataSet#minBy(int...) only works 
on Tuple types.")
    +    }
    +
    +    reduce(new 
SelectByMinFunction[T](getType.asInstanceOf[TupleTypeInfoBase[T]], 
fields.toArray))
    +  }
    +
    +  /**
    +    * Selects an element with maximum value.
    +    *
    +    * The maximum is computed over the specified fields in lexicographical 
order.
    +    *
    +    * Example 1: Given a data set with elements [0, 1], [1, 0], the
    +    * results will be:
    +    *
    +    * maxBy(0)[1, 0]
    +    * maxBy(1)[0, 1]
    +    * Example 2: Given a data set with elements [0, 0], [0, 1], the
    +    * results will be:
    +    * maxBy(0, 1)[0, 1]
    +    * If multiple values with maximum value at the specified fields exist, 
a random one will be
    +    * picked
    +    * Internally, this operation is implemented as a {@link 
ReduceFunction}.
    +    *
    +    */
    +  def maxBy(fields: Int*) : DataSet[T] = {
    +    if (!getType.isTupleType) {
    +      throw new InvalidProgramException("DataSet#maxBy(int...) only works 
on Tuple types.")
    +    }
    +    reduce(new 
SelectByMaxFunction[T](getType.asInstanceOf[TupleTypeInfoBase[T]], 
fields.toArray))
    +  }
    --- End diff --
    
    This is very sharp eyes :)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #1856: FLINK-3650 Add maxBy/minBy to Scala DataSet API

Reply via email to