[jira] [Commented] (FLINK-8431) Allow to specify # GPUs for TaskManager in Mesos

Dongwon Kim (JIRA) Tue, 16 Jan 2018 23:23:20 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-8431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328386#comment-16328386
 ]


Dongwon Kim commented on FLINK-8431:
------------------------------------

[~eronwright] I'm testing my implementation by launching a standalone Flink 
cluster using {{./bin/mesos-appmaster.sh}}. I tested the following scenarios 
with Mesos configured with {{--filter_gpu_resources}}.
 * *When {{mesos.resourcemanager.tasks.gpus}} is not specified or is set to 0.0*
 ** {{LaunchCoordinator}} isn't given any offer because 
{{MesosFlinkResourceManager}} does not enable {{GPU_RESOURCES}} capability when 
{{mesos.resourcemanager.tasks.gpus}} is not specified or it is set to 0.
 * *When {{mesos.resourcemanager.tasks.gpus}} is smaller than or equal to the 
available GPUs on a node* 
 ** Given offers, {{LaunchCoordinator}} aggregates offers of different roles 
from the same node and puts aggregated offers to Fenzo for scheduling resources 
over nodes. When notified of the success of scheduling from Fenzo, 
{{LaunchCoordinator}} allocates resources of different roles to tasks and then 
populate {{Protos.TaskInfo}} using the allocated resources which is then wired 
to the Mesos master.
 * *When {{mesos.resourcemanager.tasks.gpus}} is bigger than the available GPUs 
on a node* 
 ** Given offers, {{LaunchCoordinator}} aggregates offers of different roles 
from the same node and puts aggregated offers to Fenzo. However, Fenzo notifies 
{{LaunchCoordinator}} of the failure of scheduling with the following messages:
     AssignmentFailure \{resource=Other, asking=3.0, used=0.0, available=2.0, 
message=gpus}.

> Allow to specify # GPUs for TaskManager in Mesos
> ------------------------------------------------
>
>                 Key: FLINK-8431
>                 URL: https://issues.apache.org/jira/browse/FLINK-8431
>             Project: Flink
>          Issue Type: Improvement
>          Components: Cluster Management, Mesos
>            Reporter: Dongwon Kim
>            Assignee: Dongwon Kim
>            Priority: Minor
>
> Mesos provides first-class support for Nvidia GPUs [1], but Flink does not 
> exploit it when scheduling TaskManagers. If Mesos agents are configured to 
> isolate GPUs as shown in [2], TaskManagers that do not specify to use GPUs 
> cannot see GPUs at all.
> We, therefore, need to introduce a new configuration property named 
> "mesos.resourcemanager.tasks.gpus" to allow users to specify # of GPUs for 
> each TaskManager process in Mesos.
> [1] http://mesos.apache.org/documentation/latest/gpu-support/
> [2] http://mesos.apache.org/documentation/latest/gpu-support/#agent-flags



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8431) Allow to specify # GPUs for TaskManager in Mesos

Reply via email to