Hi Fan,

Thanks for putting this together. I have been looking into this quite
a bit myself recently, and have been slowly preparing a design doc for
both CAT and CMT support in Mesos. One of the biggest things I have
been trying to figure out (which is why I haven't pushed my design doc
out yet) is how to combine CAT support with the existing resource
model.

Specifically, Mesos currently gives out fractional cores using the
cgroups cpu.shares mechanism and doesn't allow tasks to choose
specific cores to run on (even more than this, there is no way for a
task to even see which specific cores might be available).
Furthermore, when a resource offer goes out, it's just a collection of
SCALARS, SETS, and RANGES, and there's no way to tie one particular
resource to another (e.g. you can't say give me cores and memory that
are close together to mitigate NUMA effects).

Given these limitations, it's not clear how to take immediate
advantage of CAT, since it relies on specifying a specific core to
allocate the cache from. That is, some mechanism must exist to ensure
that both the CPU and the cache are colocated.  This is a problem with
the current resource model in general, and applies to properly
supporting NUMA as well.

You seem to propose simply adding cache partitions as a first class
resource on par with CPUs and memory, with no mention of its
dependence on particular cores.  What are your thoughts on this?

Kevin

On Mon, Apr 4, 2016 at 7:36 PM, Du, Fan <[email protected]> wrote:
> Hi,ALL
>
> MESOS-5076 is filed to investigate how Intel Cache Allocation
> Technology(CAT)[1] could be
> used in Mesos. Some introduction and early thoughts is documented here[2].
>
> The motivation is to:
> a) Add CAT isolation support for Mesos Containerization
> b) Expose Last Level Cache(LLC) as Scalar Resource
> c) Bridge the interface gap for Docker Containerization,
>    CAT support for Docker[3] has been submitted to Docker OCI with positive
> feedback.
>
> The ultimate goal is to provide operator CAT isolator for better colocation
> of cluster resources.
> I'm looking forward for any comments for community to move this forward.
>
> Thanks!
>
> [1]:http://www.intel.com/content/www/us/en/communications/cache-monitoring-cache-allocation-technologies.html
> [2]:https://docs.google.com/document/d/130ay0e2DZ9S61SC3tGcik5wQaC8L40t5tWj3K3GJxTg/edit?usp=sharing
> [3]:https://github.com/opencontainers/runtime-spec/pull/267
>     https://github.com/opencontainers/runc/pull/447
>



-- 
~Kevin

Reply via email to