Greetings! On Fri, Nov 9, 2012 at 9:16 PM, Brian Bockelman <[email protected]> wrote: > > On Nov 9, 2012, at 11:49 PM, Dan Bradley <[email protected]> wrote: > >> Hi all >> >> I was thinking about the lack of support in Condor for setting cpu affinity >> with partitionable slots in a useful way. We do this on our >> non-partitionable slots to avoid the inevitable accidents where jobs try to >> use the whole machine from a single-core slot. We'd like to be able to do >> it on the partitionable slots. >> >> My first question is whether cpu shares in cgroups make the above use-case >> of cpu affinity obsolete. > > Hi Dan, > > Yeah, it's pretty fun watching users try to do this at our cluster. They > don't get particularly far, but they at least soak up any extra idle CPU > cycles on the node. > >> >> If not, then it would be really nice to have cpu affinity working in the >> partitionable slot world. The problem is that all dynamic slots under a >> single partitionable slot get assigned to the same set of cpus. It seems to >> me that the startd needs to manage the partitioning of the cpu set when it >> creates the dynamic slots. >> >> Are there plans for generic support for this sort of non-fungible resource >> partitioning? Implementing this specific case does not sound very hard, as >> long as we (at least initially) just use a first-fit strategy and do not >> worry about optimizing which cores go best together in case of multi-core >> jobs. I think it could even be done without adding any new configuration >> knobs (gasp). >> > > How are you going to assign CPU sets? The traditional syscall route or the > cgroup route? It strikes me that, if you go the cgroup route, you actually > could repack processes on different cores later to optimize topology. > > In the end, because we don't optimize core topology, I don't particularly > think it is any better than using cgroups (in fact, slightly worse because > you may hold cores unnecessarily idle, nor can you oversubscribe a node). I > guess it's something for people without cgroups? > > How does this interact with cgroups? Right now, we have: > > MEMORY_LIMIT=[soft|hard|none] > > for the policy of what to do when the job goes over the memory limit. Maybe > we also have: > > CPU_LIMIT=[soft|hard|none] > > ? > > Brian
So, you might also want to look to "numad" from RHEL 6.3+ and recent Fedora's for inspiration: http://git.fedorahosted.org/cgit/numad.git/ It uses cgroups and "fixes" longrunning processes (see bind_process_and_migrate_memory() ), but also provides "numad -w NCPUS[:MB]", which passes back an argument for use with numactl to pack reservations in the best way possible. -- Lans Carstensen _______________________________________________ HTCondor-devel mailing list [email protected] https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
