[ 
https://issues.apache.org/jira/browse/FLINK-1450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296968#comment-14296968
 ] 

Aljoscha Krettek commented on FLINK-1450:
-----------------------------------------

I see the necessity in the streaming API. Also, in the streaming API we don't 
have to worry about making this "combinable" the way we can make a GroupReduce 
operation combinable in the batch API.

When we want to add this to the batch API we have to think about how we can 
make this combinable. I just thought about a similar problem: I want to have an 
"aggregation" that combines elements into a list of elements, i.e:

{code}
DataSet<Tuple2<String, Integer>> in = ...
DataSet<Tuple2<String, List<Integer>> result = in.groupBy(0).aggregate(1, 
COLLECT)
{code}

The problem here, is that the output of the combiner would be pre-aggregated 
lists, this would not match the expected reducer input of single items. I 
thought about providing a local combiner as an explicit operation and then the 
program could be expressed as a local combine and than a reduce with a 
different input type. Here, the local combiner would be a fold while the 
reducer is a classic shuffled reduce.

I hope I'm making sense to you somehow. :D

> Add Fold operator to the Streaming api
> --------------------------------------
>
>                 Key: FLINK-1450
>                 URL: https://issues.apache.org/jira/browse/FLINK-1450
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming
>    Affects Versions: 0.9
>            Reporter: Gyula Fora
>            Priority: Minor
>              Labels: starter
>
> The streaming API currently doesn't support a fold operator.
> This operator would work as the foldLeft method in Scala. This would allow 
> effective implementations in a lot of cases where a the simple reduce is 
> inappropriate due to different return types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to