GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/6208
Disable local recovery scheduling
## What is the purpose of the change
Introduce a SchedulingStrategy which is used by the SlotPool to schedule
tasks.
The default implementation is LocationPreferenceSchedulingStrategy which
tries
to schedule tasks to their preferred locations. In order to support local
recovery
the PreviousAllocationSchedulingStrategy schedules tasks to their previous
allocation.
The scheduling strategy is selected based on the configuration option
state.backend.local-recovery. If set to true, then
PreviousAllocationSchedulingStrategy
is selected. Otherwise LocationPreferenceSchedulingStrategy is selected.
## Brief change log
- Introduced a `SlotPoolFactory` to make the `SlotPool` instantiation
configurable
- Introduce `SchedulingStrategy` as scheduling logic abstraction
- Introduce `LocationPreferenceSchedulingStrategy` as default implementation
- Introduce `PreviousAllocationSchedulingStrategy` as scheduling strategy
for local recovery
- Move scheduling logic out of `SlotProfile` into `SchedulingStrategy`
- Instantiate `SchedulingStrategy` based on `state.backend.local-recovery`
option
## Verifying this change
- Added `SchedulingITCase` which covers the problem with local recovery
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: (no)
- The serializers: (no)
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
- The S3 file system connector: (no)
## Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink
disableLocalRecoveryScheduling
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/6208.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6208
----
commit 9fd63001497793ce0ed7e224d5678842e34484cd
Author: Till Rohrmann <trohrmann@...>
Date: 2018-06-25T08:16:31Z
[hotfix] Introduce SlotPoolResource and TestingRpcServiceResource
commit 1f867c9a1f3524dce0329d3198b950e2ad62a82c
Author: Till Rohrmann <trohrmann@...>
Date: 2018-06-25T08:34:30Z
[hotfix] Introduce SlotPoolFactory to make SlotPool instantiation
configurable
commit b552a2344529a272ec098bfd9c82d4a30f860726
Author: Till Rohrmann <trohrmann@...>
Date: 2018-06-26T08:01:19Z
[hotfix] Remove SlotIdleTimeout from JobMasterConfiguration
commit 5d80afdf11ab35b17e00b846f00ba586880fc8d7
Author: Till Rohrmann <trohrmann@...>
Date: 2018-06-22T14:34:10Z
[FLINK-9634] Disable local recovery scheduling if local recovery is disabled
Introduce a SchedulingStrategy which is used by the SlotPool to schedule
tasks.
The default implementation is LocationPreferenceSchedulingStrategy which
tries
to schedule tasks to their preferred locations. In order to support local
recovery
the PreviousAllocationSchedulingStrategy schedules tasks to their previous
allocation.
The scheduling strategy is selected based on the configuration option
state.backend.local-recovery. If set to true, then
PreviousAllocationSchedulingStrategy
is selected. Otherwise LocationPreferenceSchedulingStrategy is selected.
----
---