GitHub user ramkrish86 opened a pull request:
https://github.com/apache/flink/pull/1856
FLINK-3650 Add maxBy/minBy to Scala DataSet API
I have tried to expose the maxBy/minBy API to scala DataSet. But one thing
to note is that in the existing scala DataSet API code groupBy() API returns a
GroupedDataSet whereas in the case of java DataSet API it is UnsortedGrouping.
The code in scala DataSet is
` // public UnsortedGrouping<T> groupBy(String... fields) {
// new UnsortedGrouping<T>(this, new Keys.ExpressionKeys<T>(fields,
getType()));
// }
`
already commented out. The UnsortedGrouping internally has maxBy and minBy.
So in this PR I have not tried to change those and hence the test case also
does not cover groupBy() clause followed by maxBy and minBy ( they are now
available only in java based MAxOperatorTest class).
Please review and provide valuable feedback.
Please note the change done to SelectByMaxFunction and SelectByMinFunction
to support all Tuples but the API itself checks if the type is of type Tuple.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ramkrish86/flink FLINK-3650
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1856.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1856
----
commit 1b46ebfa3489432adde5a032c892dd5ec6c6d61c
Author: Vasudevan <[email protected]>
Date: 2016-04-06T06:13:07Z
FLINK-3650 Add maxBy/minBy to Scala DataSet API
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---