[jira] [Commented] (FLINK-3543) Introduce ResourceManager component

Maximilian Michels (JIRA) Thu, 19 May 2016 03:25:50 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290858#comment-15290858
 ]


Maximilian Michels commented on FLINK-3543:
-------------------------------------------

Hi! Scheduling is not part of the ResourceManager. The JobManager would have to 
deal with these issues.

> Introduce ResourceManager component
> -----------------------------------
>
>                 Key: FLINK-3543
>                 URL: https://issues.apache.org/jira/browse/FLINK-3543
>             Project: Flink
>          Issue Type: New Feature
>          Components: JobManager, ResourceManager, TaskManager
>    Affects Versions: 1.1.0
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>             Fix For: 1.1.0
>
>         Attachments: ResourceManagerSketch.pdf
>
>
> So far the JobManager has been the central instance which is responsible for 
> resource management and allocation.
> While thinking about how to integrate Mesos support in Flink, people from the 
> Flink community realized that it would be nice to delegate resource 
> allocation to a dedicated process. This process may run independently of the 
> JobManager which is a requirement for proper integration of cluster 
> allocation frameworks like Mesos.
> This has led to the idea of creating a new component called the 
> {{ResourceManager}}. Its task is to allocate and maintain resources requested 
> by the {{JobManager}}. The ResourceManager has a very abstract notion of 
> resources.
> Initially, we thought we could make the ResourceManager deal with resource 
> allocation and the registration/supervision of the TaskManagers. However, 
> this approach proved to add unnecessary complexity to the runtime. 
> Registration state of TaskManagers had to be kept in sync at both the 
> JobManager and the ResourceManager.
> That's why [~StephanEwen] and me changed the ResourceManager's role to simply 
> deal with the resource acquisition. The TaskManagers still register with the 
> JobManager which informs the ResourceManager about the successful 
> registration of a TaskManager. The ResourceManager may inform the JobManager 
> of failed TaskManagers. Due to the insight which the ResourceManager has in 
> the resource health, it may detect failed TaskManagers much earlier than the 
> heartbeat-based monitoring of the JobManager.
> At this stage, the ResourceManager is an optional component. That means the 
> JobManager doesn't depend on the ResourceManager as long as it has enough 
> resources to perform the computation. All bookkeeping is performed by the 
> JobManager. When the ResourceManager connects to the JobManager, it receives 
> the current resources, i.e. task manager instances, and allocates more 
> containers if necessary. The JobManager adjusts the number of containers 
> through the {{SetWorkerPoolSize}} method.
> In standalone mode, the ResourceManager may be deactivated or simply use the 
> StandaloneResourceManager which does practically nothing because we don't 
> need to allocate resources in standalone mode.
> In YARN mode, the ResourceManager takes care of communicating with the Yarn 
> resource manager. When containers fail, it informs the JobManager and tries 
> to allocate new containers. The ResourceManager runs as an actor within the 
> same actor system as the JobManager. It could, however, also run 
> independently. The independent mode would be the default behavior for Mesos 
> where the framework master is expected to just deal with resource allocation.
> The attached figures depict the message flow between ResourceManager, 
> JobManager, and TaskManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3543) Introduce ResourceManager component

Reply via email to