[ 
https://issues.apache.org/jira/browse/MESOS-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris updated MESOS-5342:
-------------------------
    Description: 
The cgroups isolator currently lacks support for binding (also called pinning) 
containers to a set of cores. The GNU/Linux kernel is known to make sub-optimal 
core assignments for processes and threads. Poor assignments impact program 
performance,specifically in terms of cache locality. Applications requiring GPU 
resources can benefit from this feature by getting access to cores closest to 
the GPU hardware, which reduces cpu-gpu copy latency.

Most cluster management systems from the HPC community (SLURM) provide both 
cgroup isolation and cpu binding. This feature would provide similar 
capabilities. The current interest in supporting Intel's Cache Allocation 
Technology will require making choices about where container's are going to run 
on the mesos-agent's processor(s) - this feature is a step toward developing a 
robust solution.

The improvement in this JIRA ticket will handle hardware topology detection, 
track container-to-core utilization in a histogram, and use a mathematical 
optimization technique to select cores for container assignment based on 
latency and the container-to-core utilization histogram.

For GPU tasks, the improvement will prioritize selection of cores based on 
latency between the GPU and cores in an effort to minimize copy latency.

  was:
The cgroups isolator currently lacks support for binding (also called pinning) 
containers to a set of cores. The GNU/Linux kernel is known to make sub-optimal 
core assignments for processes and threads. Poor assignments impact program 
performance, particularly in the case of applications requiring GPU resources. 

Most cluster management systems from the HPC community (SLURM) provide both 
cgroup isolation and cpu binding. This feature would provide similar 
capabilities. The current interest in supporting Intel's Cache Allocation 
Technology will require making choices about where container's are going to run 
on the mesos-agent's processor(s) - this feature is a step toward developing a 
robust solution.

The improvement in this JIRA ticket will handle hardware topology detection, 
track container-to-core utilization in a histogram, and use a mathematical 
optimization technique to select cores for container assignment based on 
latency and the container-to-core utilization histogram.

For GPU tasks, the improvement will prioritize selection of cores based on 
latency between the GPU and cores in an effort to minimize copy latency.


> CPU pinning/binding support for CgroupsCpushareIsolatorProcess
> --------------------------------------------------------------
>
>                 Key: MESOS-5342
>                 URL: https://issues.apache.org/jira/browse/MESOS-5342
>             Project: Mesos
>          Issue Type: Improvement
>          Components: cgroups, containerization
>    Affects Versions: 0.28.1
>            Reporter: Chris
>
> The cgroups isolator currently lacks support for binding (also called 
> pinning) containers to a set of cores. The GNU/Linux kernel is known to make 
> sub-optimal core assignments for processes and threads. Poor assignments 
> impact program performance,specifically in terms of cache locality. 
> Applications requiring GPU resources can benefit from this feature by getting 
> access to cores closest to the GPU hardware, which reduces cpu-gpu copy 
> latency.
> Most cluster management systems from the HPC community (SLURM) provide both 
> cgroup isolation and cpu binding. This feature would provide similar 
> capabilities. The current interest in supporting Intel's Cache Allocation 
> Technology will require making choices about where container's are going to 
> run on the mesos-agent's processor(s) - this feature is a step toward 
> developing a robust solution.
> The improvement in this JIRA ticket will handle hardware topology detection, 
> track container-to-core utilization in a histogram, and use a mathematical 
> optimization technique to select cores for container assignment based on 
> latency and the container-to-core utilization histogram.
> For GPU tasks, the improvement will prioritize selection of cores based on 
> latency between the GPU and cores in an effort to minimize copy latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to