[
https://issues.apache.org/jira/browse/FLINK-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290858#comment-15290858
]
Maximilian Michels commented on FLINK-3543:
-------------------------------------------
Hi! Scheduling is not part of the ResourceManager. The JobManager would have to
deal with these issues.
> Introduce ResourceManager component
> -----------------------------------
>
> Key: FLINK-3543
> URL: https://issues.apache.org/jira/browse/FLINK-3543
> Project: Flink
> Issue Type: New Feature
> Components: JobManager, ResourceManager, TaskManager
> Affects Versions: 1.1.0
> Reporter: Maximilian Michels
> Assignee: Maximilian Michels
> Fix For: 1.1.0
>
> Attachments: ResourceManagerSketch.pdf
>
>
> So far the JobManager has been the central instance which is responsible for
> resource management and allocation.
> While thinking about how to integrate Mesos support in Flink, people from the
> Flink community realized that it would be nice to delegate resource
> allocation to a dedicated process. This process may run independently of the
> JobManager which is a requirement for proper integration of cluster
> allocation frameworks like Mesos.
> This has led to the idea of creating a new component called the
> {{ResourceManager}}. Its task is to allocate and maintain resources requested
> by the {{JobManager}}. The ResourceManager has a very abstract notion of
> resources.
> Initially, we thought we could make the ResourceManager deal with resource
> allocation and the registration/supervision of the TaskManagers. However,
> this approach proved to add unnecessary complexity to the runtime.
> Registration state of TaskManagers had to be kept in sync at both the
> JobManager and the ResourceManager.
> That's why [~StephanEwen] and me changed the ResourceManager's role to simply
> deal with the resource acquisition. The TaskManagers still register with the
> JobManager which informs the ResourceManager about the successful
> registration of a TaskManager. The ResourceManager may inform the JobManager
> of failed TaskManagers. Due to the insight which the ResourceManager has in
> the resource health, it may detect failed TaskManagers much earlier than the
> heartbeat-based monitoring of the JobManager.
> At this stage, the ResourceManager is an optional component. That means the
> JobManager doesn't depend on the ResourceManager as long as it has enough
> resources to perform the computation. All bookkeeping is performed by the
> JobManager. When the ResourceManager connects to the JobManager, it receives
> the current resources, i.e. task manager instances, and allocates more
> containers if necessary. The JobManager adjusts the number of containers
> through the {{SetWorkerPoolSize}} method.
> In standalone mode, the ResourceManager may be deactivated or simply use the
> StandaloneResourceManager which does practically nothing because we don't
> need to allocate resources in standalone mode.
> In YARN mode, the ResourceManager takes care of communicating with the Yarn
> resource manager. When containers fail, it informs the JobManager and tries
> to allocate new containers. The ResourceManager runs as an actor within the
> same actor system as the JobManager. It could, however, also run
> independently. The independent mode would be the default behavior for Mesos
> where the framework master is expected to just deal with resource allocation.
> The attached figures depict the message flow between ResourceManager,
> JobManager, and TaskManager.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)