Thanks for heads up! Kevin and Niklas.

Exposing LLC as resource needs special modification to current resource managing and offering behavior.
Here is my early thoughts:
1) 'cpu' resource is essentially a cpu share resources, while LLC is per processor resources,
  This will require:
  1a): Resource offer for cpu and LLC have to be NUMA node aware
For a two NUMA nodes Agent with 2x40 logical cpu cores, suppose LLC has 20 subsets.
       Master will make two resource offers:
       Offer1: cpu 40; LLC 20 with NUMA 1
       Offer2: cpu 40; LLC 20 with NUMA 2

From a high level point of view, all the RDT related features require Mesos to be aware of hardware topology when managing resources, .e.g Memory Bandwidth will also be one type of resource, anyway it’s a long term goal to make this happen eventually.

  1b): Agent will apply cpu share isolation along with cpuset.
       We might need to revisit MESOS-314 to support this partially.

Actually, CAT kernel support could/should still support scenario when task migrate between NUMA nodes, but right now it does not. This is why I filed the ticket and draft the initial design doc to track this.

2) All the Monitoring support(CMT and MBM) is all most ready in Mesos, that’s all perf stuff.
Check MESOS-4955 and MESOS-4595 for details.




On 2016/4/5 23:13, Niklas Nielsen wrote:
My gut feeling is that it won't be very useful to expose LLC as a first
class resource type at this point.
It's very hard to pick for the user and requires framework support
everywhere.
Also, as Kevin mentioned, Mesos doesn't pin your tasks so you don't know
which cores your tasks will be running on.

We have been talking about QoS isolators in the performance isolation
working group, where more low-level decisions are made on the agent itself.
Both core pinning and CAT would be controls which those isolators could
adjust to uphold higher level notions of task performance tiers.

Let's discuss this in the performance isolation working group. We can
schedule a call end of this week or start next week.

Niklas


On Mon, Apr 4, 2016 at 10:39 PM, Kevin Klues <[email protected]> wrote:

Hi Fan,

Thanks for putting this together. I have been looking into this quite
a bit myself recently, and have been slowly preparing a design doc for
both CAT and CMT support in Mesos. One of the biggest things I have
been trying to figure out (which is why I haven't pushed my design doc
out yet) is how to combine CAT support with the existing resource
model.

Specifically, Mesos currently gives out fractional cores using the
cgroups cpu.shares mechanism and doesn't allow tasks to choose
specific cores to run on (even more than this, there is no way for a
task to even see which specific cores might be available).
Furthermore, when a resource offer goes out, it's just a collection of
SCALARS, SETS, and RANGES, and there's no way to tie one particular
resource to another (e.g. you can't say give me cores and memory that
are close together to mitigate NUMA effects).

Given these limitations, it's not clear how to take immediate
advantage of CAT, since it relies on specifying a specific core to
allocate the cache from. That is, some mechanism must exist to ensure
that both the CPU and the cache are colocated.  This is a problem with
the current resource model in general, and applies to properly
supporting NUMA as well.

You seem to propose simply adding cache partitions as a first class
resource on par with CPUs and memory, with no mention of its
dependence on particular cores.  What are your thoughts on this?

Kevin

On Mon, Apr 4, 2016 at 7:36 PM, Du, Fan <[email protected]> wrote:
Hi,ALL

MESOS-5076 is filed to investigate how Intel Cache Allocation
Technology(CAT)[1] could be
used in Mesos. Some introduction and early thoughts is documented
here[2].

The motivation is to:
a) Add CAT isolation support for Mesos Containerization
b) Expose Last Level Cache(LLC) as Scalar Resource
c) Bridge the interface gap for Docker Containerization,
    CAT support for Docker[3] has been submitted to Docker OCI with
positive
feedback.

The ultimate goal is to provide operator CAT isolator for better
colocation
of cluster resources.
I'm looking forward for any comments for community to move this forward.

Thanks!

[1]:
http://www.intel.com/content/www/us/en/communications/cache-monitoring-cache-allocation-technologies.html
[2]:
https://docs.google.com/document/d/130ay0e2DZ9S61SC3tGcik5wQaC8L40t5tWj3K3GJxTg/edit?usp=sharing
[3]:https://github.com/opencontainers/runtime-spec/pull/267
     https://github.com/opencontainers/runc/pull/447




--
~Kevin




Reply via email to