On Wednesday 2018-04-04 15:30, Anatoly Pugachev wrote:
>
>Can someone tell me or suggest why does getconf returns total available to a 
>physical machine
>cpu count, and not LDOM allocated processor/vcpu count ?
>
>ttip$ getconf -a | grep PROCESSORS
>_NPROCESSORS_CONF                  256
>_NPROCESSORS_ONLN                  16

It's how the hypervisor populates the MDESC info passed to domains.
It even does that for control domains. Almost makes one want to believe
the 6-core T1s could be functional 8-core ones. (Who knows.)


jengelh@a1:~$ lscpu
Architecture:        sparc64 (T1000)
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Big Endian
CPU(s):              24
On-line CPU(s) list: 0-23

Anyhow, by declaring nr_cpus_present=256 but only nr_cpus_online=16, domains can
get more vCPUs assigned at a later time without rebooting the OS, if one
assumes that the MDESC is meant to be static during runtime and/or that the OS
cannot handle a change of nr_cpus_present (there are possibly some allocations
of exactly this size all over the place). Similar to NR_CPUS.



>I'm raising this issue, because some userspace tools use nproc to run parallel 
>make for
>example. And starting from 4.15+ (but not on 4.14) kernel overcommited CPU 
>usage (for example,
>using make -j256 on a LDOM with 16 vcpus allocated) gets me to the following 
>(reproducible):

I consider defaulting to nr_cpus_present totally wrong for any software.
CPUs can be offlined by the administrator at will.

Second, I would also consider defaulting to on nr_cpus_online to be a subpar
default. Some institutions (or the load balancing system in use there) use
sched_setaffinity as a primitive measure to statically ensure that login shells
never run on the same core as a number cruncher task.
`make -j$(getconf _NPROCESSORS_ONLN)` would be dumb there because only e.g. 2 
cores
are available for shells/gcc - the other 14 are assigned to HPC tasks.

Reply via email to