Github user ramkrish86 commented on a diff in the pull request:
https://github.com/apache/flink/pull/1553#discussion_r54329786
--- Diff:
flink-optimizer/src/main/java/org/apache/flink/optimizer/operators/GroupReduceWithCombineProperties.java
---
@@ -87,19 +92,39 @@ public DriverStrategy getStrategy() {
return DriverStrategy.SORTED_GROUP_REDUCE;
}
- @Override
public SingleInputPlanNode instantiate(Channel in, SingleInputNode
node) {
if (in.getShipStrategy() == ShipStrategyType.FORWARD) {
// adjust a sort (changes grouping, so it must be for
this driver to combining sort
- if (in.getLocalStrategy() == LocalStrategy.SORT) {
- if
(!in.getLocalStrategyKeys().isValidUnorderedPrefix(this.keys)) {
- throw new RuntimeException("Bug:
Inconsistent sort for group strategy.");
+ if(in.getSource().getOptimizerNode() instanceof
PartitionNode) {
+ // Inject a combiner before the partition node
+ Channel toCombiner = new
Channel(in.getSource());
+
toCombiner.setShipStrategy(ShipStrategyType.FORWARD,
DataExchangeMode.PIPELINED);
+ GroupReduceNode combinerNode =
((GroupReduceNode) node).getCombinerUtilityNode();
+
combinerNode.setParallelism(in.getSource().getParallelism());
+
if(toCombiner.getSource().getInputs().iterator().hasNext()) {
+ Channel source =
toCombiner.getSource().getInputs().iterator().next();
+ // A combiner plan node is created with
the map as the input
+ SingleInputPlanNode combiner = new
SingleInputPlanNode(combinerNode, "Combine("+node.getOperator()
+ .getName()+")", source,
DriverStrategy.SORTED_GROUP_COMBINE);
+ addCombinerNodeData(in, toCombiner,
combiner);
+ Channel combinerChannel = new
Channel(combiner);
+
combinerChannel.setShipStrategy(ShipStrategyType.FORWARD,
DataExchangeMode.PIPELINED);
--- End diff --
If am not wrong, the ShipStrategyType and DataExchangeMode that we set to
the CombinerChannel should be the one associated with the 'in' node that is
passed on to the method? Which in the case of Wordcount example is FORWARD and
PIPELINED?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---