[ 
https://issues.apache.org/jira/browse/FLINK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358897#comment-16358897
 ] 

ASF GitHub Bot commented on FLINK-8629:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/5446

    [FLINK-8629] [flip6] Allow JobMaster to rescale jobs

    ## What is the purpose of the change
    
    This commit adds the functionality to rescale a job or parts of it to
    the JobMaster. In order to rescale a job, the JobMaster does the following:
    1. Take a savepoint
    2. Create a rescaled ExecutionGraph from the JobGraph
    3. Initialize it with the taken savepoint
    4. Suspend the old ExecutionGraph
    5. Restart the new ExecutionGraph once the old ExecutionGraph has been 
suspended
    
    This PR is based on #5445, #5444, #4510
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
      - The S3 file system connector: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes)
      - If yes, how is the feature documented? (not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink rescalingRpc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5446.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5446
    
----
commit ffc9edd8f41c4a8508170580f945c5b9ed911d01
Author: Till Rohrmann <trohrmann@...>
Date:   2018-02-01T13:37:15Z

    [hotfix] Fix checkstyle violations in ExecutionGraph

commit 38006cfd9fef14fa4aa0dc23cb6a4e4afd019006
Author: Till Rohrmann <trohrmann@...>
Date:   2018-02-01T17:04:06Z

    [FLINK-8627] Introduce new JobStatus#SUSPENDING
    
    The new JobStatus#SUSPENDING says that an ExecutionGraph has been suspended 
but its
    clean up has not been done yet. Only after all Executions have been 
canceled, the
    ExecutionGraph will enter the SUSPENDED state and complete the termination 
future
    accordingly.

commit b9c77594b98c8fe8799a7149fbcfad6157d7aa5e
Author: Till Rohrmann <trohrmann@...>
Date:   2018-02-09T13:07:31Z

    [FLINK-8626] Introduce BackPressureStatsTracker interface
    
    Renames BackPressureStatsTracker into BackPressureStatsTrackerImpl and 
introduce
    a BackPressureStatsTracker interface. This will make testing easier when we 
don't
    have to set up all the different components.

commit f0d7d8e69c16261f140faf2943fd15485837609b
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date:   2017-08-10T05:41:40Z

    [FLINK-7124] [flip-6] Add test to verify rescaling JobGraphs works correctly
    
    This commit adds two tests to verify behaviours of rescaling JobGraphs:
    1. JobGraphs can be consecutively rescaled to arbitrary valid DOPs
    2. Rescaling beyond max parallelism would fail
    
    The second test, however, is temporarily disabled for now since it
    doesn't properly fail.

commit 2a473673e00d3ab7a2597eb5182b162f342c2d96
Author: Till Rohrmann <trohrmann@...>
Date:   2018-02-01T15:14:53Z

    [FLINK-8546] [flip6] Respect savepoints and restore from latest checkpoints
    
    Let the JobMaster respect checkpoints and savepoints. The JobMaster will 
always
    try to restore the latest checkpoint if there is one available. Next it 
will check
    whether savepoint restore settings have been set. If so, then it will try 
to restore
    the savepoint. Only if these settings are not set, the job will be started 
from
    scratch.

commit f0f24a2701298010fd2403a03b8a5ff98d41eb3c
Author: Till Rohrmann <trohrmann@...>
Date:   2018-02-09T13:18:11Z

    [hotfix] [tests] Simplify JobMasterTest

commit 930106c3383ea1179475e27ff608bf4df2ac0773
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date:   2017-08-10T05:57:09Z

    [hotfix] Refactor graph verification code in ExecutionGraphConstructionTest
    
    The refactoring resuses utility methods in ExecutionGraphTestUtils to
    verify constructed ExecutionGraphs.

commit f902b9eb8776d7df8a9b62fa556756d00b3b4c15
Author: Till Rohrmann <trohrmann@...>
Date:   2018-02-02T08:57:14Z

    [FLINK-7124] Introduce parallelism <= max parallelism check into 
ExecutionJobVertex
    
    Check that the parallelism is smaller than the max parallelism when 
creating an
    ExecutionJobVertex.

commit 7c6a18e4fdcbec7d0cdf38d16baab699eec7b208
Author: Till Rohrmann <trohrmann@...>
Date:   2018-02-01T13:37:37Z

    [FLINK-8629] [flip6] Allow JobMaster to rescale jobs
    
    This commit adds the functionality to rescale a job or parts of it to
    the JobMaster. In order to rescale a job, the JobMaster does the following:
    1. Take a savepoint
    2. Create a rescaled ExecutionGraph from the JobGraph
    3. Initialize it with the taken savepoint
    4. Suspend the old ExecutionGraph
    5. Restart the new ExecutionGraph once the old ExecutionGraph has been 
suspended

----


> Allow JobMaster to rescale jobs
> -------------------------------
>
>                 Key: FLINK-8629
>                 URL: https://issues.apache.org/jira/browse/FLINK-8629
>             Project: Flink
>          Issue Type: New Feature
>          Components: Distributed Coordination
>    Affects Versions: 1.5.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Major
>              Labels: flip-6
>             Fix For: 1.5.0
>
>
> The {{JobMaster}} should be able to rescale a job or a subset of its 
> operators. In order to do that we have to expose RPC calls to trigger this 
> action.
> The rescaling works by first taking a savepoint, then suspending the old job, 
> rescale it and then restart it from the taken savepoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to