Hi everyone, We have been talking about core affinity in Mesos for a while, and Ian D. has recently been giving this topic thought in his ‘exclusive resources’ proposal [1]. Trying to avoid too conservative placements, latency critical workloads are at risk without it. We are interested in the topic through our work on oversubscription in Serenity [2], as oversubscription was exactly to be able to colocate latency critical and best-effort batch jobs. We had an informal meeting yesterday, going over the proposal and trying to get some cadence behind the capability.
It is a tricky but exciting topic: - How do we avoid making task launch even more complex? How do we express the topology and acquire parts of it. Do we use hints on the affinity properties instead? - How do we mix pinned with normal ‘floating’ tasks. - How do we convey information to the resource estimator about the task sensitivity. Note, above list not meant for inlined discussion or answers. Let’s collect feedback on the proposals themselves. Here are our proposed next steps: - We are going to use the ‘Isolation Working Group’ as an umbrella for this. I will fill in details and members. - We will schedule an online meeting within the Wednesday 9AM PST next week discussing next steps. I will share a hangout link when we get closer. - Plan being, getting to designs (maybe more than one) we agree on and then scope out and distribute the work needed to be done. Who ever is interested, join us. The use cases for this work are critical. Maybe we can even work on some representative workloads we can verify our proposal against. Cheers, Niklas PS For comments on the proposal itself, please refer to Ian’s thread for the dev list [3]. [1] https://issues.apache.org/jira/browse/MESOS-4138 [2] https://github.com/mesosphere/serenity [3] https://www.mail-archive.com/dev%40mesos.apache.org/msg33892.html
