[ 
https://issues.apache.org/jira/browse/FLINK-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108548#comment-15108548
 ] 

ASF GitHub Bot commented on FLINK-3256:
---------------------------------------

GitHub user senorcarbone opened a pull request:

    https://github.com/apache/flink/pull/1526

    [FLINK-3256] Fix colocation group re-instantiation

    This PR deals with the problem of inconsistent colocation groups upon 
reconfiguration. The problem was that we were removing shared constraints 
multiple times for each ExecutionJobVertex, thus, colocated vertices, in the 
same co-location group, ended up being scheduled with different constraints 
leading to wrong redeployment.
    
    To deal with it we keep all distinct colocation groups in the execution 
graph and reset them once outside the individual ExecutionJobVertex 
re-instantiation. There is also a new test that is used to check whether 
certain properties are consistent after reconfiguration. We can potentially add 
more properties in the same test to ensure that they are also maintained upon 
reconfiguration.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/senorcarbone/flink egfix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1526.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1526
    
----
commit a8f24b5003885596f48eaa73b24b94dfbc5380e6
Author: Paris Carbone <[email protected]>
Date:   2016-01-20T02:03:41Z

    [FLINK-3256] Fix colocation group re-instantiation

----


> Invalid execution graph cleanup for jobs with colocation groups
> ---------------------------------------------------------------
>
>                 Key: FLINK-3256
>                 URL: https://issues.apache.org/jira/browse/FLINK-3256
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Runtime
>            Reporter: Paris Carbone
>            Assignee: Paris Carbone
>            Priority: Blocker
>
> Currently, upon restarting an execution graph, we clean-up the colocation 
> constraints for each group present in an ExecutionJobVertex respectively.
> This can lead to invalid reconfiguration upon a restart or any other activity 
> that relies on state cleanup of the execution graph. For example, upon 
> restarting a DataStream job with iterations the following steps are executed:
> 1) IterationSource colgroup constraints are reset
> 2) IterationSource execution vertices reset and create new colocation 
> constraints
> 3) IterationSink colgroup constraints are reset
> 4) IterationSink execution vertices reset and create different colocation 
> constraints.
> This can be trivially fixed by reseting colocation groups independently from 
> ExecutionJobVertices, thus, updating them once per reconfiguration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to