[
https://issues.apache.org/jira/browse/FLINK-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172214#comment-15172214
]
ASF GitHub Bot commented on FLINK-3544:
---------------------------------------
GitHub user mxm opened a pull request:
https://github.com/apache/flink/pull/1741
[FLINK-3544] Introduce ResourceManager component
So far the JobManager has been the central instance which is responsible
for resource management and allocation.
While thinking about how to integrate Mesos support in Flink, people from
the Flink community realized that it would be nice to delegate resource
allocation to a dedicated process. This process may run independently of the
JobManager which is a requirement for proper integration of cluster allocation
frameworks like Mesos.
This has led to the idea of creating a new component called the
ResourceManager. Its task is to allocate and maintain resources requested by
the JobManager. The ResourceManager has a very abstract notion of resources.
Initially, we thought we could make the ResourceManager deal with resource
allocation and the registration/supervision of the TaskManagers. However, this
approach proved to add unnecessary complexity to the runtime. Registration
state of TaskManagers had to be kept in sync at both the JobManager and the
ResourceManager.
That's why @StephanEwen and me changed the ResourceManager's role to simply
deal with the resource acquisition. The TaskManagers still register with the
JobManager which informs the ResourceManager about the successful registration
of a TaskManager. The ResourceManager may inform the JobManager of failed
TaskManagers. Due to the insight which the ResourceManager has in the resource
health, it may detect failed TaskManagers much earlier than the heartbeat-based
monitoring of the JobManager.
At this stage, the ResourceManager is an optional component. That means the
JobManager doesn't depend on the ResourceManager as long as it has enough
resources to perform the computation. All bookkeeping is performed by the
JobManager. When the ResourceManager connects to the JobManager, it receives
the current resources, i.e. task manager instances, and allocates more
containers if necessary. The JobManager adjusts the number of containers
through the SetWorkerPoolSize method.
In standalone mode, the ResourceManager may be deactivated or simply use
the StandaloneResourceManager which does practically nothing because we don't
need to allocate resources in standalone mode.
In YARN mode, the ResourceManager takes care of communicating with the Yarn
resource manager. When containers fail, it informs the JobManager and tries to
allocate new containers. The ResourceManager runs as an actor within the same
actor system as the JobManager. It could, however, also run independently. The
independent mode would be the default behavior for Mesos where the framework
master is expected to just deal with resource allocation.
The attached figures depict the message flow between ResourceManager,
JobManager, and TaskManager.
[ResourceManagerSketch.pdf](https://github.com/apache/flink/files/151425/ResourceManagerSketch.pdf)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mxm/flink resourceManager-v2-squashed
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1741.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1741
----
commit 9a32d5caff88b63566365637e6f6176f5bbe83a7
Author: Maximilian Michels <[email protected]>
Date: 2016-02-09T10:20:42Z
[FLINK-3544][runtime] Introduce ResourceManager component
commit 26d1b77b4157e001d9fbdea1fd214f9f31c14bbb
Author: Maximilian Michels <[email protected]>
Date: 2016-02-24T08:20:42Z
[FLINK-3545][yarn] integrate ResourceManager support
----
> ResourceManager runtime components
> ----------------------------------
>
> Key: FLINK-3544
> URL: https://issues.apache.org/jira/browse/FLINK-3544
> Project: Flink
> Issue Type: Sub-task
> Components: ResourceManager
> Affects Versions: 1.1.0
> Reporter: Maximilian Michels
> Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)