On 2015-12-15 19:38, Martin Buchholz wrote:
Actually calling nproc as a separate process at runtime is interesting
but totally unorthodox.

I think the configury pain is the usual: detecting sched.h,
sched_getaffinity, CPU_COUNT, don't forget _GNU_SOURCE, check you're
on a glibc system, probably check at runtime too, so use dlsym to
access sched_getaffinity, look for similar hacks on non-glibc systems.
Worry about systems with more than 1024 cpus.  Worry about
sched_getaffinity returning a higher number than the old way.

Is that enough things to worry about?

Are you talking about JDK-6515172? I was thinking on how to implement a proper check in the configure script, where calling separate process are not so unorthodox after all. ;-)

I'd still like to see some real-world confirmation that nproc does indeed return the correct number of cpus in a Docker environment.

/Magnus



On Tue, Dec 15, 2015 at 5:28 AM, Magnus Ihse Bursie
<magnus.ihse.bur...@oracle.com> wrote:
On 2015-12-15 04:27, Martin Buchholz wrote:
My current mental model is
configured cpus >= online cpus >= allowed cpus
In a traditional system they are all the same.

I experimented and saw that cpusets are indeed turned on in some
systems used for testing at Google.
I.e. allowed cpus is a strict subset of online cpus.

It seems likely that the following would be a better implementation of
availableProcessors on Linux:

    cpu_set_t s;
    return (sched_getaffinity(0, sizeof(s), &s) == 0) ? CPU_COUNT(&s) :
fallback_to_old_way();

with all the pain in configury.

Making system calls from configure is more than acceptably difficult. :-)
But if nproc does this, we can do something like checking if nproc is
present, and if so, if it returns a non-zero value, we use it, otherwise we
fall back to the current method. Is that what you're suggesting?

/Magnus



On Mon, Dec 14, 2015 at 6:58 AM, Mikael Gerdin <mikael.ger...@oracle.com>
wrote:
Hi David,

On 2015-12-11 14:21, David Holmes wrote:
On 11/12/2015 11:16 PM, Magnus Ihse Bursie wrote:
On 2015-12-03 03:11, Roger Riggs wrote:
Hi,

It would be useful to figure out the number of cpus available when in
a container.
Some comments have added to:
8140793 <https://bugs.openjdk.java.net/browse/JDK-8140793>
getAvailableProcessors may incorrectly report the number of cpus in
Docker container

But so far we haven't dug deep enough.   Suggestions are welcome?


http://serverfault.com/questions/691659/count-number-of-allowed-cpus-in-a-docker-container

suggests running nproc. I'm not sure if that can be counted on to be
present, but we could certainly check for it.

I'd like to know how nproc does it so we can try to apply the same logic
in the VM for Runtime.availableProcessors. Can someone actually confirm
that it returns the number of processors available to the container?

I don't have a container at hand but running nproc under strace suggests
that it calls sched_getaffinity and counts the number of set bits in the
cpu
affinity mask:

$ strace -e trace=sched_getaffinity nproc
sched_getaffinity(0, 128, {f, 0, 0, 0}) = 32
4
+++ exited with 0 +++

It would be nice if anyone with access to a system where the number of
cpus
is limited in a similar manner to a docker container could run the above
command and see if it
1) returns the correct number of cpus
2) works as I think, that is, it counts the number of set bits in the
array
which is the third syscall argument.


/Mikael



David

/Magnus

Roger


On 12/2/15 6:59 PM, Martin Buchholz wrote:
Not to say you shouldn't do this, but I worry that increasingly
computing
is being done in "containers" where e.g. the number of cpus is
doubling
every year but only a small number are available to actually be used
by a
given process.  if availableProcessors reports 1 million, what
should we
do?  (no need to answer...)

On Tue, Dec 1, 2015 at 1:55 AM, Erik Joelsson
<erik.joels...@oracle.com>
wrote:

Hello,

The current heuristic for figuring out what to default set the -j
flag to
make needs some tweaking.

In JDK 9, it looks at the amount of memory and the number of cpus in
the
system. It divides memory by 1024 to get a safe number of jobs that
will
fit into memory. The lower of that number and the number of cpus is
then
picked. The number is then scaled down to about 90% of the number of
cpus
to leave some resources for other activities. It is also capped at
16.

Since we now have the build using "nice" to make sure the build
isn't
bogging down the system, I see no reason to do the 90% scaling
anymore.
Also, the performance issues that forced us to cap at 16 have long
been
fixed, and even if we don't scale well beyond 16, we do still scale.
So I
propose we remove that arbitrary limitation too.

Bug: https://bugs.openjdk.java.net/browse/JDK-8144312
Webrev: http://cr.openjdk.java.net/~erikj/8144312/webrev.01/

/Erik


Reply via email to