Hi everyone,

We have been talking about core affinity in Mesos for a while, and Ian D. has 
recently been giving this topic thought in his ‘exclusive resources’ proposal 
[1].
Trying to avoid too conservative placements, latency critical workloads are at 
risk without it.
We are interested in the topic through our work on oversubscription in Serenity 
[2], as oversubscription was exactly to be able to colocate latency critical 
and best-effort batch jobs.
We had an informal meeting yesterday, going over the proposal and trying to get 
some cadence behind the capability.

It is a tricky but exciting topic:
 - How do we avoid making task launch even more complex? How do we express the 
topology and acquire parts of it. Do we use hints on the affinity properties 
instead?
 - How do we mix pinned with normal ‘floating’ tasks.
 - How do we convey information to the resource estimator about the task 
sensitivity.

Note, above list not meant for inlined discussion or answers. Let’s collect 
feedback on the proposals themselves.

Here are our proposed next steps:
 - We are going to use the ‘Isolation Working Group’ as an umbrella for this. I 
will fill in details and members.
 - We will schedule an online meeting within the Wednesday 9AM PST next week 
discussing next steps. I will share a hangout link when we get closer.
 - Plan being, getting to designs (maybe more than one) we agree on and then 
scope out and distribute the work needed to be done.

Who ever is interested, join us. The use cases for this work are critical. 
Maybe we can even work on some representative workloads we can verify our 
proposal against.

Cheers,
Niklas

PS For comments on the proposal itself, please refer to Ian’s thread for the 
dev list [3].

[1] https://issues.apache.org/jira/browse/MESOS-4138
[2] https://github.com/mesosphere/serenity
[3] https://www.mail-archive.com/dev%40mesos.apache.org/msg33892.html

Reply via email to