Max Michels created FLINK-1621: ---------------------------------- Summary: Create a generalized combine function Key: FLINK-1621 URL: https://issues.apache.org/jira/browse/FLINK-1621 Project: Flink Issue Type: Improvement Components: Distributed Runtime Affects Versions: 0.9 Reporter: Max Michels Fix For: 0.9
Flink allows combiners which accept a type {{I}} and combine the values of this type into type {{O}}. In Google Dataflow, combiners are more generalized. They accept an Input {{I}}, produce an intermediate combine value of {{T}}, and finally an output {{O}}. Flink's combiners are like the {{SimpleCombineFn}} in Google Dataflow. Right now, we translate the {{KeyedCombineFn}} into a {{SortPartition}} followed by a {{MapPartition}} to emulate the Combiner's behavior. Rudimentary performance tests showed that this behavior causes a significant increase in run time compared to the proper Combine implementation. Let's implement a more generalized Combiner to create a better mapping from Google Dataflow to Flink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)