[
https://issues.apache.org/jira/browse/FLINK-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094345#comment-14094345
]
Stephan Ewen commented on FLINK-1043:
-------------------------------------
In some cases (non grouped operations), you might be able to solve this problem
using a combination of "partition map" -> "reduce". The partitionMap will be
added in the next version (see pull request below).
https://github.com/apache/incubator-flink/pull/42
> Alternative combine interface
> -----------------------------
>
> Key: FLINK-1043
> URL: https://issues.apache.org/jira/browse/FLINK-1043
> Project: Flink
> Issue Type: Wish
> Reporter: Sebastian Kruse
>
> The GroupReduce allows for the following combination reduce step:
> {{InputType}} -> combine -> {{InputType}} -> reduce -> {{OutputType}}.
> However, in the use cases I have stumbled upon so far, it would make more
> sense to have the following steps: {{InputType}} -> {{OutputType}} ->
> {{OutputType}}. It seems more intuitive to me to create a set of partial
> results with the combiners that will finally merged within the reduce phase
> into an overall result. This sometimes bars me from using a combiner.
> I provide some examples for this intuition.
> * WordCount
> ** If you want to implement WordCount with as a combinable GroupReduce, then
> you have to preprocess all words as {{Tuple2<String, 1>}}. This could be
> avoided if the combination result was not necessarily equal to the input type.
> * create a Bloom filter
> ** Bloom filters can be created locally on each node and later on be merged
> into a final, global Bloom filter, thus lend themselves for a combine-reduce
> proceeding. Doing this with a combinable GroupReduce would currently require
> to turn each input element into a singleton Bloom filter before the
> combination phase.
> Therefore, it would be nice to have the ability to use {{OutputType}} as the
> combiner result.
--
This message was sent by Atlassian JIRA
(v6.2#6252)