On Nov 12, 2012, at 9:11 AM, Tim St Clair <[email protected]> wrote:

> of note: current cpu_shares (which only exists on master) uses SlotWeight, 
> where I think it should really be TotalSlotCpus.
> 

Matt called me out on the above also; SlotWeight is probably more flexible, but 
possibly overloads the meaning of SlotWeight.  I'm a bit ambivalent and would 
be fine with changing.

> Open Questions:
> Does anyone have a good way of *really* testing cpu_shares? 
> What does over-subscription and fractions actually mean, or do we want to 
> stick with whole numbers?  
> ++How does this^ affect performance?  
> 

I have no good way to *really* test them out (whatever that means), but we've 
used cpu_share for the last 6 months and have anecdotal evidence.

It has to be an integer value.  It seems to only matter to compare the relative 
shares within sibling cgroups.  For example, /condor/ has cpu_shares=1024 (the 
default), but /condor/<job ID> has cpu_shares=100 for each job.

We've had a few cases where someone would send a multicore job but only request 
1 CPU.  In this case, we've verified:
1) If the system is busy, the multicore job gets only 1 core worth of CPU time 
(the amount allocated).
2) If cycles would otherwise go unused, the multicore job gets those.

So, it works as described.  That's the good news.

The bad news is that we have seen CPU-scheduler-related kernel panics on RHEL 
6.3; while quite rare, I think they're cgroups-related.  Maybe one a week?

Brian

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
HTCondor-devel mailing list
[email protected]
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel

Reply via email to