My gut feeling is that it won't be very useful to expose LLC as a first class resource type at this point. It's very hard to pick for the user and requires framework support everywhere. Also, as Kevin mentioned, Mesos doesn't pin your tasks so you don't know which cores your tasks will be running on.
We have been talking about QoS isolators in the performance isolation working group, where more low-level decisions are made on the agent itself. Both core pinning and CAT would be controls which those isolators could adjust to uphold higher level notions of task performance tiers. Let's discuss this in the performance isolation working group. We can schedule a call end of this week or start next week. Niklas On Mon, Apr 4, 2016 at 10:39 PM, Kevin Klues <[email protected]> wrote: > Hi Fan, > > Thanks for putting this together. I have been looking into this quite > a bit myself recently, and have been slowly preparing a design doc for > both CAT and CMT support in Mesos. One of the biggest things I have > been trying to figure out (which is why I haven't pushed my design doc > out yet) is how to combine CAT support with the existing resource > model. > > Specifically, Mesos currently gives out fractional cores using the > cgroups cpu.shares mechanism and doesn't allow tasks to choose > specific cores to run on (even more than this, there is no way for a > task to even see which specific cores might be available). > Furthermore, when a resource offer goes out, it's just a collection of > SCALARS, SETS, and RANGES, and there's no way to tie one particular > resource to another (e.g. you can't say give me cores and memory that > are close together to mitigate NUMA effects). > > Given these limitations, it's not clear how to take immediate > advantage of CAT, since it relies on specifying a specific core to > allocate the cache from. That is, some mechanism must exist to ensure > that both the CPU and the cache are colocated. This is a problem with > the current resource model in general, and applies to properly > supporting NUMA as well. > > You seem to propose simply adding cache partitions as a first class > resource on par with CPUs and memory, with no mention of its > dependence on particular cores. What are your thoughts on this? > > Kevin > > On Mon, Apr 4, 2016 at 7:36 PM, Du, Fan <[email protected]> wrote: > > Hi,ALL > > > > MESOS-5076 is filed to investigate how Intel Cache Allocation > > Technology(CAT)[1] could be > > used in Mesos. Some introduction and early thoughts is documented > here[2]. > > > > The motivation is to: > > a) Add CAT isolation support for Mesos Containerization > > b) Expose Last Level Cache(LLC) as Scalar Resource > > c) Bridge the interface gap for Docker Containerization, > > CAT support for Docker[3] has been submitted to Docker OCI with > positive > > feedback. > > > > The ultimate goal is to provide operator CAT isolator for better > colocation > > of cluster resources. > > I'm looking forward for any comments for community to move this forward. > > > > Thanks! > > > > [1]: > http://www.intel.com/content/www/us/en/communications/cache-monitoring-cache-allocation-technologies.html > > [2]: > https://docs.google.com/document/d/130ay0e2DZ9S61SC3tGcik5wQaC8L40t5tWj3K3GJxTg/edit?usp=sharing > > [3]:https://github.com/opencontainers/runtime-spec/pull/267 > > https://github.com/opencontainers/runc/pull/447 > > > > > > -- > ~Kevin > -- Niklas
