[
https://issues.apache.org/jira/browse/FLINK-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090803#comment-14090803
]
Sebastian Kruse commented on FLINK-1043:
----------------------------------------
I see. Another approach might be to have a reduce-merge interface instead of
the classical combine-reduce. Oftentimes, the combine and reduce functions are
logically equivalent (for associative, commutative functions), only the combine
function works only on the subset of the complete dataset. Therefore, one could
the reduce step as core logic but a subsequent merge step as the optional part.
> Alternative combine interface
> -----------------------------
>
> Key: FLINK-1043
> URL: https://issues.apache.org/jira/browse/FLINK-1043
> Project: Flink
> Issue Type: Wish
> Reporter: Sebastian Kruse
>
> The GroupReduce allows for the following combination reduce step:
> {{InputType}} -> combine -> {{InputType}} -> reduce -> {{OutputType}}.
> However, in the use cases I have stumbled upon so far, it would make more
> sense to have the following steps: {{InputType}} -> {{OutputType}} ->
> {{OutputType}}. It seems more intuitive to me to create a set of partial
> results with the combiners that will finally merged within the reduce phase
> into an overall result. This sometimes bars me from using a combiner.
> I provide some examples for this intuition.
> * WordCount
> ** If you want to implement WordCount with as a combinable GroupReduce, then
> you have to preprocess all words as {{Tuple2<String, 1>}}. This could be
> avoided if the combination result was not necessarily equal to the input type.
> * create a Bloom filter
> ** Bloom filters can be created locally on each node and later on be merged
> into a final, global Bloom filter, thus lend themselves for a combine-reduce
> proceeding. Doing this with a combinable GroupReduce would currently require
> to turn each input element into a singleton Bloom filter before the
> combination phase.
> Therefore, it would be nice to have the ability to use {{OutputType}} as the
> combiner result.
--
This message was sent by Atlassian JIRA
(v6.2#6252)