Actually calling nproc as a separate process at runtime is interesting but totally unorthodox.
I think the configury pain is the usual: detecting sched.h, sched_getaffinity, CPU_COUNT, don't forget _GNU_SOURCE, check you're on a glibc system, probably check at runtime too, so use dlsym to access sched_getaffinity, look for similar hacks on non-glibc systems. Worry about systems with more than 1024 cpus. Worry about sched_getaffinity returning a higher number than the old way. Is that enough things to worry about? On Tue, Dec 15, 2015 at 5:28 AM, Magnus Ihse Bursie <magnus.ihse.bur...@oracle.com> wrote: > On 2015-12-15 04:27, Martin Buchholz wrote: >> >> My current mental model is >> configured cpus >= online cpus >= allowed cpus >> In a traditional system they are all the same. >> >> I experimented and saw that cpusets are indeed turned on in some >> systems used for testing at Google. >> I.e. allowed cpus is a strict subset of online cpus. >> >> It seems likely that the following would be a better implementation of >> availableProcessors on Linux: >> >> cpu_set_t s; >> return (sched_getaffinity(0, sizeof(s), &s) == 0) ? CPU_COUNT(&s) : >> fallback_to_old_way(); >> >> with all the pain in configury. > > > Making system calls from configure is more than acceptably difficult. :-) > But if nproc does this, we can do something like checking if nproc is > present, and if so, if it returns a non-zero value, we use it, otherwise we > fall back to the current method. Is that what you're suggesting? > > /Magnus > > > >> >> On Mon, Dec 14, 2015 at 6:58 AM, Mikael Gerdin <mikael.ger...@oracle.com> >> wrote: >>> >>> Hi David, >>> >>> On 2015-12-11 14:21, David Holmes wrote: >>>> >>>> On 11/12/2015 11:16 PM, Magnus Ihse Bursie wrote: >>>>> >>>>> On 2015-12-03 03:11, Roger Riggs wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> It would be useful to figure out the number of cpus available when in >>>>>> a container. >>>>>> Some comments have added to: >>>>>> 8140793 <https://bugs.openjdk.java.net/browse/JDK-8140793> >>>>>> getAvailableProcessors may incorrectly report the number of cpus in >>>>>> Docker container >>>>>> >>>>>> But so far we haven't dug deep enough. Suggestions are welcome? >>>>> >>>>> >>>>> >>>>> http://serverfault.com/questions/691659/count-number-of-allowed-cpus-in-a-docker-container >>>>> >>>>> suggests running nproc. I'm not sure if that can be counted on to be >>>>> present, but we could certainly check for it. >>>> >>>> >>>> I'd like to know how nproc does it so we can try to apply the same logic >>>> in the VM for Runtime.availableProcessors. Can someone actually confirm >>>> that it returns the number of processors available to the container? >>> >>> >>> I don't have a container at hand but running nproc under strace suggests >>> that it calls sched_getaffinity and counts the number of set bits in the >>> cpu >>> affinity mask: >>> >>> $ strace -e trace=sched_getaffinity nproc >>> sched_getaffinity(0, 128, {f, 0, 0, 0}) = 32 >>> 4 >>> +++ exited with 0 +++ >>> >>> It would be nice if anyone with access to a system where the number of >>> cpus >>> is limited in a similar manner to a docker container could run the above >>> command and see if it >>> 1) returns the correct number of cpus >>> 2) works as I think, that is, it counts the number of set bits in the >>> array >>> which is the third syscall argument. >>> >>> >>> /Mikael >>> >>> >>> >>>> David >>>> >>>>> /Magnus >>>>> >>>>>> Roger >>>>>> >>>>>> >>>>>> On 12/2/15 6:59 PM, Martin Buchholz wrote: >>>>>>> >>>>>>> Not to say you shouldn't do this, but I worry that increasingly >>>>>>> computing >>>>>>> is being done in "containers" where e.g. the number of cpus is >>>>>>> doubling >>>>>>> every year but only a small number are available to actually be used >>>>>>> by a >>>>>>> given process. if availableProcessors reports 1 million, what >>>>>>> should we >>>>>>> do? (no need to answer...) >>>>>>> >>>>>>> On Tue, Dec 1, 2015 at 1:55 AM, Erik Joelsson >>>>>>> <erik.joels...@oracle.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> The current heuristic for figuring out what to default set the -j >>>>>>>> flag to >>>>>>>> make needs some tweaking. >>>>>>>> >>>>>>>> In JDK 9, it looks at the amount of memory and the number of cpus in >>>>>>>> the >>>>>>>> system. It divides memory by 1024 to get a safe number of jobs that >>>>>>>> will >>>>>>>> fit into memory. The lower of that number and the number of cpus is >>>>>>>> then >>>>>>>> picked. The number is then scaled down to about 90% of the number of >>>>>>>> cpus >>>>>>>> to leave some resources for other activities. It is also capped at >>>>>>>> 16. >>>>>>>> >>>>>>>> Since we now have the build using "nice" to make sure the build >>>>>>>> isn't >>>>>>>> bogging down the system, I see no reason to do the 90% scaling >>>>>>>> anymore. >>>>>>>> Also, the performance issues that forced us to cap at 16 have long >>>>>>>> been >>>>>>>> fixed, and even if we don't scale well beyond 16, we do still scale. >>>>>>>> So I >>>>>>>> propose we remove that arbitrary limitation too. >>>>>>>> >>>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8144312 >>>>>>>> Webrev: http://cr.openjdk.java.net/~erikj/8144312/webrev.01/ >>>>>>>> >>>>>>>> /Erik >>>>>>>> >