[
https://issues.apache.org/jira/browse/FLINK-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047498#comment-14047498
]
Fabian Hueske commented on FLINK-970:
-------------------------------------
Yes, this looks good in principle.
However, you need to take the DOP of the operator into account. Otherwise,
you'll have dop*n instead of n result tuples. An UDF can look up its parallel
task id and the total number of parallel tasks via their execution environment.
Also, I'd add a combiner to reduce the shipped data volume.
> Implement a first(n) operator
> -----------------------------
>
> Key: FLINK-970
> URL: https://issues.apache.org/jira/browse/FLINK-970
> Project: Flink
> Issue Type: New Feature
> Reporter: Timo Walther
> Priority: Minor
>
> It is only syntactic sugar, but I had many cases where I just needed the
> first element or the first 2 elements in a GroupReduce.
> E.g. Instead of
> {code:java}
> .reduceGroup(new GroupReduceFunction<String, String>() {
> @Override
> public void reduce(Iterator<String>
> values, Collector<String> out) throws Exception {
> out.collect(values.next());
> }
> })
> {code}
> {code:java}
> .first()
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)