[
https://issues.apache.org/jira/browse/FLINK-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358897#comment-16358897
]
ASF GitHub Bot commented on FLINK-8629:
---------------------------------------
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/5446
[FLINK-8629] [flip6] Allow JobMaster to rescale jobs
## What is the purpose of the change
This commit adds the functionality to rescale a job or parts of it to
the JobMaster. In order to rescale a job, the JobMaster does the following:
1. Take a savepoint
2. Create a rescaled ExecutionGraph from the JobGraph
3. Initialize it with the taken savepoint
4. Suspend the old ExecutionGraph
5. Restart the new ExecutionGraph once the old ExecutionGraph has been
suspended
This PR is based on #5445, #5444, #4510
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (no)
- The serializers: (no)
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
- The S3 file system connector: (no)
## Documentation
- Does this pull request introduce a new feature? (yes)
- If yes, how is the feature documented? (not documented)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink rescalingRpc
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5446.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5446
----
commit ffc9edd8f41c4a8508170580f945c5b9ed911d01
Author: Till Rohrmann <trohrmann@...>
Date: 2018-02-01T13:37:15Z
[hotfix] Fix checkstyle violations in ExecutionGraph
commit 38006cfd9fef14fa4aa0dc23cb6a4e4afd019006
Author: Till Rohrmann <trohrmann@...>
Date: 2018-02-01T17:04:06Z
[FLINK-8627] Introduce new JobStatus#SUSPENDING
The new JobStatus#SUSPENDING says that an ExecutionGraph has been suspended
but its
clean up has not been done yet. Only after all Executions have been
canceled, the
ExecutionGraph will enter the SUSPENDED state and complete the termination
future
accordingly.
commit b9c77594b98c8fe8799a7149fbcfad6157d7aa5e
Author: Till Rohrmann <trohrmann@...>
Date: 2018-02-09T13:07:31Z
[FLINK-8626] Introduce BackPressureStatsTracker interface
Renames BackPressureStatsTracker into BackPressureStatsTrackerImpl and
introduce
a BackPressureStatsTracker interface. This will make testing easier when we
don't
have to set up all the different components.
commit f0d7d8e69c16261f140faf2943fd15485837609b
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date: 2017-08-10T05:41:40Z
[FLINK-7124] [flip-6] Add test to verify rescaling JobGraphs works correctly
This commit adds two tests to verify behaviours of rescaling JobGraphs:
1. JobGraphs can be consecutively rescaled to arbitrary valid DOPs
2. Rescaling beyond max parallelism would fail
The second test, however, is temporarily disabled for now since it
doesn't properly fail.
commit 2a473673e00d3ab7a2597eb5182b162f342c2d96
Author: Till Rohrmann <trohrmann@...>
Date: 2018-02-01T15:14:53Z
[FLINK-8546] [flip6] Respect savepoints and restore from latest checkpoints
Let the JobMaster respect checkpoints and savepoints. The JobMaster will
always
try to restore the latest checkpoint if there is one available. Next it
will check
whether savepoint restore settings have been set. If so, then it will try
to restore
the savepoint. Only if these settings are not set, the job will be started
from
scratch.
commit f0f24a2701298010fd2403a03b8a5ff98d41eb3c
Author: Till Rohrmann <trohrmann@...>
Date: 2018-02-09T13:18:11Z
[hotfix] [tests] Simplify JobMasterTest
commit 930106c3383ea1179475e27ff608bf4df2ac0773
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date: 2017-08-10T05:57:09Z
[hotfix] Refactor graph verification code in ExecutionGraphConstructionTest
The refactoring resuses utility methods in ExecutionGraphTestUtils to
verify constructed ExecutionGraphs.
commit f902b9eb8776d7df8a9b62fa556756d00b3b4c15
Author: Till Rohrmann <trohrmann@...>
Date: 2018-02-02T08:57:14Z
[FLINK-7124] Introduce parallelism <= max parallelism check into
ExecutionJobVertex
Check that the parallelism is smaller than the max parallelism when
creating an
ExecutionJobVertex.
commit 7c6a18e4fdcbec7d0cdf38d16baab699eec7b208
Author: Till Rohrmann <trohrmann@...>
Date: 2018-02-01T13:37:37Z
[FLINK-8629] [flip6] Allow JobMaster to rescale jobs
This commit adds the functionality to rescale a job or parts of it to
the JobMaster. In order to rescale a job, the JobMaster does the following:
1. Take a savepoint
2. Create a rescaled ExecutionGraph from the JobGraph
3. Initialize it with the taken savepoint
4. Suspend the old ExecutionGraph
5. Restart the new ExecutionGraph once the old ExecutionGraph has been
suspended
----
> Allow JobMaster to rescale jobs
> -------------------------------
>
> Key: FLINK-8629
> URL: https://issues.apache.org/jira/browse/FLINK-8629
> Project: Flink
> Issue Type: New Feature
> Components: Distributed Coordination
> Affects Versions: 1.5.0
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
> Priority: Major
> Labels: flip-6
> Fix For: 1.5.0
>
>
> The {{JobMaster}} should be able to rescale a job or a subset of its
> operators. In order to do that we have to expose RPC calls to trigger this
> action.
> The rescaling works by first taking a savepoint, then suspending the old job,
> rescale it and then restart it from the taken savepoint.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)