GitHub user aljoscha opened a pull request: https://github.com/apache/flink/pull/1589
[FLINK-3336] Add Semi-Rebalance Data Shipping for DataStream The name of this is still up for discussion but I'm opening this so that people can look at the implementation and especially, the unit test. This is the Javadoc of DataStream.semiRebalance() that describes the behaviour: Sets the partitioning of the {@link DataStream} so that the output elements are distributed evenly to a subset of instances of the next operation in a round-robin fashion. The subset of downstream operations to which the upstream operation sends elements depends on the degree of parallelism of both the upstream and downstream operation. For example, if the upstream operation has parallelism 2 and the downstream operation has parallelism 4, then one upstream operation would distribute elements to two downstream operations while the other upstream operation would distribute to the other two downstream operations. If, on the other hand, the downstream operation has parallelism 2 while the upstream operation has parallelism 4 then two upstream operations will distribute to one downstream operation while the other two upstream operations will distribute to the other downstream operations. In cases where the different parallelisms are not multiples of each other one or several downstream operations will have a differing number of inputs from upstream operations. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/flink pattern-x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1589.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1589 ---- commit 57c0d9ed044560721bf3d6bb465e48d3c2de555a Author: Aljoscha Krettek <aljoscha.kret...@gmail.com> Date: 2016-02-04T17:33:01Z [FLINK-3336] Add Semi-Rebalance Data Shipping for DataStream This is the Javadoc of DataStream.semiRebalance() that describes the behaviour: Sets the partitioning of the {@link DataStream} so that the output elements are distributed evenly to a subset of instances of the next operation in a round-robin fashion. The subset of downstream operations to which the upstream operation sends elements depends on the degree of parallelism of both the upstream and downstream operation. For example, if the upstream operation has parallelism 2 and the downstream operation has parallelism 4, then one upstream operation would distribute elements to two downstream operations while the other upstream operation would distribute to the other two downstream operations. If, on the other hand, the downstream operation has parallelism 2 while the upstream operation has parallelism 4 then two upstream operations will distribute to one downstream operation while the other two upstream operations will distribute to the other downstream operations. In cases where the different parallelisms are not multiples of each other one or several downstream operations will have a differing number of inputs from upstream operations. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---