GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/3394

    [FLINK-5810] [flip6] Introduce a hardened slot manager

    This PR is based on #3310.
    
    Harden the slot manager so that it better deals with lost and out of order 
messages
    from the TaskManager. The basic idea is that the TaskManager are considered 
the ground
    truth and the SlotManager tries to maintain a consistent view of what is 
reported to it
    by the TaskManagers. This has the assumption that the TaskManagers 
regularly report their
    slot status to the SlotManager piggy backed on the heartbeat signals to the 
ResourceManager (not yet implemented, though). That way it is possible to 
handle lost and out of order messages because the SlotManager will eventually 
converge on a consistent view of the actual slot allocation.
    
    Additionally, the hardened SlotManager registers for idle TaskManagers and 
pending slot
    requests a timeout. If the timeout expires, then the TaskManagers are 
released and the
    slot requests are failed, respectively. This prevents resource leaks and 
wasteful resource allocation.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink newSlotManager

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3394.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3394
    
----
commit 336e479e9892acbdaf54b36d98dc810ea7192c39
Author: Till Rohrmann <[email protected]>
Date:   2017-02-14T15:50:43Z

    [FLINK-5798] [rpc] Let the RpcService provide a ScheduledExecutorService
    
    This PR adds the getScheduledExecutorService method to the RpcService 
interface. So
    henceforth all RpcService implementations have to provide a 
ScheduledExecutorService
    implementation.
    
    Currently, we only support the AkkaRpcService. The AkkaRpcService returns a
    ScheduledExecutorService proxy which forwards the schedule calls to the 
ActorSystem's
    internal scheduler.

commit f5a7de2811ef21b55edcb74ad247664d251ac071
Author: Till Rohrmann <[email protected]>
Date:   2017-02-22T16:49:33Z

    Introduce ScheduledExecutor interface to hide service methods from the 
ScheduledExecutorService

commit 857a8f7e6b363bc4a57d8950a74367e8e8bfe195
Author: Till Rohrmann <[email protected]>
Date:   2017-02-09T10:59:45Z

    [FLINK-5810] [flip6] Introduce a hardened slot manager
    
    Harden the slot manager so that it better deals with lost and out of order 
messages
    from the TaskManager. The basic idea is that the TaskManager are considered 
the ground
    truth and the SlotManager tries to maintain a consistent view of what is 
reported to it
    by the TaskManagers. This has the assumption that the TaskManagers 
regularly report their
    slot status to the SlotManager piggy backed on the heartbeat signals to the 
ResourceManager.
    That way it is possible to handle lost and out of order messages because 
the SlotManager
    will eventually converge on a consistent view of the actual slot allocation.
    
    Additionally, the hardened SlotManager registers for idle TaskManagers and 
pending slot
    requests a timeout. If the timoeut expires, then the TaskManagers are 
released and the
    slot request is failed. This prevents resource leaks and wasteful resource 
allocation.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to