Greetings!  On Fri, Nov 9, 2012 at 9:16 PM, Brian Bockelman
<[email protected]> wrote:
>
> On Nov 9, 2012, at 11:49 PM, Dan Bradley <[email protected]> wrote:
>
>> Hi all
>>
>> I was thinking about the lack of support in Condor for setting cpu affinity 
>> with partitionable slots in a useful way.  We do this on our 
>> non-partitionable slots to avoid the inevitable accidents where jobs try to 
>> use the whole machine from a single-core slot.  We'd like to be able to do 
>> it on the partitionable slots.
>>
>> My first question is whether cpu shares in cgroups make the above use-case 
>> of cpu affinity obsolete.
>
> Hi Dan,
>
> Yeah, it's pretty fun watching users try to do this at our cluster.  They 
> don't get particularly far, but they at least soak up any extra idle CPU 
> cycles on the node.
>
>>
>> If not, then it would be really nice to have cpu affinity working in the 
>> partitionable slot world.  The problem is that all dynamic slots under a 
>> single partitionable slot get assigned to the same set of cpus.  It seems to 
>> me that the startd needs to manage the partitioning of the cpu set when it 
>> creates the dynamic slots.
>>
>> Are there plans for generic support for this sort of non-fungible resource 
>> partitioning?  Implementing this specific case does not sound very hard, as 
>> long as we (at least initially) just use a first-fit strategy and do not 
>> worry about optimizing which cores go best together in case of multi-core 
>> jobs.  I think it could even be done without adding any new configuration 
>> knobs (gasp).
>>
>
> How are you going to assign CPU sets?  The traditional syscall route or the 
> cgroup route?  It strikes me that, if you go the cgroup route, you actually 
> could repack processes on different cores later to optimize topology.
>
> In the end, because we don't optimize core topology, I don't particularly 
> think it is any better than using cgroups (in fact, slightly worse because 
> you may hold cores unnecessarily idle, nor can you oversubscribe a node).  I 
> guess it's something for people without cgroups?
>
> How does this interact with cgroups?  Right now, we have:
>
> MEMORY_LIMIT=[soft|hard|none]
>
> for the policy of what to do when the job goes over the memory limit.  Maybe 
> we also have:
>
> CPU_LIMIT=[soft|hard|none]
>
> ?
>
> Brian

So, you might also want to look to "numad" from RHEL 6.3+ and recent
Fedora's for inspiration:

http://git.fedorahosted.org/cgit/numad.git/

It uses cgroups and "fixes" longrunning processes (see
bind_process_and_migrate_memory() ), but also provides "numad -w
NCPUS[:MB]", which passes back an argument for use with numactl to
pack reservations in the best way possible.

-- Lans Carstensen
_______________________________________________
HTCondor-devel mailing list
[email protected]
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel

Reply via email to