Re: GPU Users -- Deprecation of GPU_RESOURCES capability

Olivier Sallou Mon, 22 May 2017 06:15:07 -0700


On 05/21/2017 03:45 AM, Kevin Klues wrote:
> Hello GPU users,
>
> We are currently considering deprecating the requirement that frameworks
> register with the GPU _RESOURCES capability in order to receive offers that
> contain GPUs. Going forward, we will recommend that users rely on Mesos's
> builtin `reservation` mechanism to achieve similar results.
>
> Before deprecating it, we wanted to get a sense from the community if
> anyone is currently relying on this capability and would like to see it
> persist. If not, we will begin deprecating it in the next Mesos release and
> completely remove it in Mesos 2.0.
Well, I am using it for GoDocker framework where jos can specify to sue
(or not) some GPUs.
>
> As background, the original motivation for this capability was to keep
> “legacy” frameworks from inadvertently scheduling jobs that don’t require
> GPUs on GPU capable machines and thus starving out other frameworks that
> legitimately want to place GPU jobs on those machines. The assumption here
> was that most machines in a cluster won't have GPUs installed on them, so
> some mechanism was necessary to keep legacy frameworks from scheduling jobs
> on those machines. In essence, it provided an implicit reservation of GPU
> machines for "GPU aware" frameworks, bypassing the traditional
> `reservation` mechanism already built into Mesos.
>
> In such a setup, legacy frameworks would be free to schedule jobs on
> non-GPU machines, and "GPU aware" frameworks would be free to schedule GPU
> jobs GPU machines and other types of jobs on other machines (or mix and
> match them however they please).
>
> However, the problem comes when *all* machines in a cluster contain GPUs
> (or even if most of the machines in a cluster container them). When this is
> the case, we have the opposite problem we were trying to solve by
> introducing the GPU_RESOURCES capability in the first place. We end up
> starving out jobs from legacy frameworks that *don’t* require GPU resources
> because there are not enough machines available that don’t have GPUs on
> them to service those jobs. We've actually seen this problem manifest in
> the wild at least once.
>
> An alternative to completely deprecating the GPU_RESOURCES flag would be to
> add a new flag to the mesos master called `--filter-gpu-resources`. When
> set to `true`, this flag will cause the mesos master to continue to
> function as it does today. That is, it would filter offers containing GPU
> resources and only send them to frameworks that opt into the GPU_RESOURCES
> framework capability. When set to `false`, this flag would cause the master
> to *not* filter offers containing GPU resources, and indiscriminately send
> them to all frameworks whether they set the GPU_RESOURCES capability or not.
>
> , this flag would allow them to keep relying on it without disruption.
>
> We'd prefer to deprecate the capability completely, but would consider
> adding this flag if people are currently relying on the GPU_RESOURCES
> capability and would like to see it persist
>
> We welcome any feedback you have.
>
> Kevin + Ben
>


-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: GPU Users -- Deprecation of GPU_RESOURCES capability

Reply via email to