[ 
https://issues.apache.org/jira/browse/FLINK-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132830#comment-15132830
 ] 

ASF GitHub Bot commented on FLINK-3336:
---------------------------------------

GitHub user aljoscha opened a pull request:

    https://github.com/apache/flink/pull/1589

    [FLINK-3336] Add Semi-Rebalance Data Shipping for DataStream

    The name of this is still up for discussion but I'm opening this so that 
people can look at the implementation and especially, the unit test.
    
    This is the Javadoc of DataStream.semiRebalance() that describes the
    behaviour:
    
    Sets the partitioning of the {@link DataStream} so that the output elements
    are distributed evenly to a subset of instances of the next operation in a 
round-robin
    fashion.
    
    The subset of downstream operations to which the upstream operation sends
    elements depends on the degree of parallelism of both the upstream and 
downstream operation.
    For example, if the upstream operation has parallelism 2 and the downstream 
operation
    has parallelism 4, then one upstream operation would distribute elements to 
two
    downstream operations while the other upstream operation would distribute 
to the other
    two downstream operations. If, on the other hand, the downstream operation 
has parallelism
    2 while the upstream operation has parallelism 4 then two upstream 
operations will
    distribute to one downstream operation while the other two upstream 
operations will
    distribute to the other downstream operations.
    
    In cases where the different parallelisms are not multiples of each other 
one or several
    downstream operations will have a differing number of inputs from upstream 
operations.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aljoscha/flink pattern-x

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1589.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1589
    
----
commit 57c0d9ed044560721bf3d6bb465e48d3c2de555a
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Date:   2016-02-04T17:33:01Z

    [FLINK-3336] Add Semi-Rebalance Data Shipping for DataStream
    
    This is the Javadoc of DataStream.semiRebalance() that describes the
    behaviour:
    
    Sets the partitioning of the {@link DataStream} so that the output elements
    are distributed evenly to a subset of instances of the next operation in a 
round-robin
    fashion.
    
    The subset of downstream operations to which the upstream operation sends
    elements depends on the degree of parallelism of both the upstream and 
downstream operation.
    For example, if the upstream operation has parallelism 2 and the downstream 
operation
    has parallelism 4, then one upstream operation would distribute elements to 
two
    downstream operations while the other upstream operation would distribute 
to the other
    two downstream operations. If, on the other hand, the downstream operation 
has parallelism
    2 while the upstream operation has parallelism 4 then two upstream 
operations will
    distribute to one downstream operation while the other two upstream 
operations will
    distribute to the other downstream operations.
    
    In cases where the different parallelisms are not multiples of each other 
one or several
    downstream operations will have a differing number of inputs from upstream 
operations.

----


> Add Semi-Rebalance Data Shipping for DataStream
> -----------------------------------------------
>
>                 Key: FLINK-3336
>                 URL: https://issues.apache.org/jira/browse/FLINK-3336
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>             Fix For: 1.0.0
>
>
> This feature has recently been requested on the ML: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Distribution-of-sinks-among-the-nodes-td4640.html
> The new data shipping pattern would allow to rebalance data only to a subset 
> of downstream operations.
> The subset of downstream operations to which the upstream operation would send
> elements depends on the degree of parallelism of both the upstream and 
> downstream operation.
> For example, if the upstream operation has parallelism 2 and the downstream 
> operation
> has parallelism 4, then one upstream operation would distribute elements to 
> two
> downstream operations while the other upstream operation would distribute to 
> the other
> two downstream operations. If, on the other hand, the downstream operation 
> had parallelism
> 2 while the upstream operation has parallelism 4 then two upstream operations 
> would
> distribute to one downstream operation while the other two upstream 
> operations would
> distribute to the other downstream operations.
> In cases where the different parallelisms are not multiples of each other one 
> or several
> downstream operations would have a differing number of inputs from upstream 
> operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to