[
https://issues.apache.org/jira/browse/FLINK-8431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328386#comment-16328386
]
Dongwon Kim commented on FLINK-8431:
------------------------------------
[~eronwright] I'm testing my implementation by launching a standalone Flink
cluster using {{./bin/mesos-appmaster.sh}}. I tested the following scenarios
with Mesos configured with {{--filter_gpu_resources}}.
* *When {{mesos.resourcemanager.tasks.gpus}} is not specified or is set to 0.0*
** {{LaunchCoordinator}} isn't given any offer because
{{MesosFlinkResourceManager}} does not enable {{GPU_RESOURCES}} capability when
{{mesos.resourcemanager.tasks.gpus}} is not specified or it is set to 0.
* *When {{mesos.resourcemanager.tasks.gpus}} is smaller than or equal to the
available GPUs on a node*
** Given offers, {{LaunchCoordinator}} aggregates offers of different roles
from the same node and puts aggregated offers to Fenzo for scheduling resources
over nodes. When notified of the success of scheduling from Fenzo,
{{LaunchCoordinator}} allocates resources of different roles to tasks and then
populate {{Protos.TaskInfo}} using the allocated resources which is then wired
to the Mesos master.
* *When {{mesos.resourcemanager.tasks.gpus}} is bigger than the available GPUs
on a node*
** Given offers, {{LaunchCoordinator}} aggregates offers of different roles
from the same node and puts aggregated offers to Fenzo. However, Fenzo notifies
{{LaunchCoordinator}} of the failure of scheduling with the following messages:
AssignmentFailure \{resource=Other, asking=3.0, used=0.0, available=2.0,
message=gpus}.
> Allow to specify # GPUs for TaskManager in Mesos
> ------------------------------------------------
>
> Key: FLINK-8431
> URL: https://issues.apache.org/jira/browse/FLINK-8431
> Project: Flink
> Issue Type: Improvement
> Components: Cluster Management, Mesos
> Reporter: Dongwon Kim
> Assignee: Dongwon Kim
> Priority: Minor
>
> Mesos provides first-class support for Nvidia GPUs [1], but Flink does not
> exploit it when scheduling TaskManagers. If Mesos agents are configured to
> isolate GPUs as shown in [2], TaskManagers that do not specify to use GPUs
> cannot see GPUs at all.
> We, therefore, need to introduce a new configuration property named
> "mesos.resourcemanager.tasks.gpus" to allow users to specify # of GPUs for
> each TaskManager process in Mesos.
> [1] http://mesos.apache.org/documentation/latest/gpu-support/
> [2] http://mesos.apache.org/documentation/latest/gpu-support/#agent-flags
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)