GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/3394
[FLINK-5810] [flip6] Introduce a hardened slot manager
This PR is based on #3310.
Harden the slot manager so that it better deals with lost and out of order
messages
from the TaskManager. The basic idea is that the TaskManager are considered
the ground
truth and the SlotManager tries to maintain a consistent view of what is
reported to it
by the TaskManagers. This has the assumption that the TaskManagers
regularly report their
slot status to the SlotManager piggy backed on the heartbeat signals to the
ResourceManager (not yet implemented, though). That way it is possible to
handle lost and out of order messages because the SlotManager will eventually
converge on a consistent view of the actual slot allocation.
Additionally, the hardened SlotManager registers for idle TaskManagers and
pending slot
requests a timeout. If the timeout expires, then the TaskManagers are
released and the
slot requests are failed, respectively. This prevents resource leaks and
wasteful resource allocation.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink newSlotManager
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3394.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3394
----
commit 336e479e9892acbdaf54b36d98dc810ea7192c39
Author: Till Rohrmann <[email protected]>
Date: 2017-02-14T15:50:43Z
[FLINK-5798] [rpc] Let the RpcService provide a ScheduledExecutorService
This PR adds the getScheduledExecutorService method to the RpcService
interface. So
henceforth all RpcService implementations have to provide a
ScheduledExecutorService
implementation.
Currently, we only support the AkkaRpcService. The AkkaRpcService returns a
ScheduledExecutorService proxy which forwards the schedule calls to the
ActorSystem's
internal scheduler.
commit f5a7de2811ef21b55edcb74ad247664d251ac071
Author: Till Rohrmann <[email protected]>
Date: 2017-02-22T16:49:33Z
Introduce ScheduledExecutor interface to hide service methods from the
ScheduledExecutorService
commit 857a8f7e6b363bc4a57d8950a74367e8e8bfe195
Author: Till Rohrmann <[email protected]>
Date: 2017-02-09T10:59:45Z
[FLINK-5810] [flip6] Introduce a hardened slot manager
Harden the slot manager so that it better deals with lost and out of order
messages
from the TaskManager. The basic idea is that the TaskManager are considered
the ground
truth and the SlotManager tries to maintain a consistent view of what is
reported to it
by the TaskManagers. This has the assumption that the TaskManagers
regularly report their
slot status to the SlotManager piggy backed on the heartbeat signals to the
ResourceManager.
That way it is possible to handle lost and out of order messages because
the SlotManager
will eventually converge on a consistent view of the actual slot allocation.
Additionally, the hardened SlotManager registers for idle TaskManagers and
pending slot
requests a timeout. If the timoeut expires, then the TaskManagers are
released and the
slot request is failed. This prevents resource leaks and wasteful resource
allocation.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---