Re: Re: RFR: JDK-8144312: Remove limitations on the default number of jobs in the build

Martin Buchholz Mon, 14 Dec 2015 19:28:23 -0800

My current mental model is
configured cpus >= online cpus >= allowed cpus
In a traditional system they are all the same.


I experimented and saw that cpusets are indeed turned on in some
systems used for testing at Google.
I.e. allowed cpus is a strict subset of online cpus.

It seems likely that the following would be a better implementation of
availableProcessors on Linux:

  cpu_set_t s;
  return (sched_getaffinity(0, sizeof(s), &s) == 0) ? CPU_COUNT(&s) :
fallback_to_old_way();

with all the pain in configury.

On Mon, Dec 14, 2015 at 6:58 AM, Mikael Gerdin <[email protected]> wrote:
> Hi David,
>
> On 2015-12-11 14:21, David Holmes wrote:
>>
>> On 11/12/2015 11:16 PM, Magnus Ihse Bursie wrote:
>>>
>>> On 2015-12-03 03:11, Roger Riggs wrote:
>>>>
>>>> Hi,
>>>>
>>>> It would be useful to figure out the number of cpus available when in
>>>> a container.
>>>> Some comments have added to:
>>>> 8140793 <https://bugs.openjdk.java.net/browse/JDK-8140793>
>>>> getAvailableProcessors may incorrectly report the number of cpus in
>>>> Docker container
>>>>
>>>> But so far we haven't dug deep enough.   Suggestions are welcome?
>>>
>>>
>>> http://serverfault.com/questions/691659/count-number-of-allowed-cpus-in-a-docker-container
>>>
>>> suggests running nproc. I'm not sure if that can be counted on to be
>>> present, but we could certainly check for it.
>>
>>
>> I'd like to know how nproc does it so we can try to apply the same logic
>> in the VM for Runtime.availableProcessors. Can someone actually confirm
>> that it returns the number of processors available to the container?
>
>
> I don't have a container at hand but running nproc under strace suggests
> that it calls sched_getaffinity and counts the number of set bits in the cpu
> affinity mask:
>
> $ strace -e trace=sched_getaffinity nproc
> sched_getaffinity(0, 128, {f, 0, 0, 0}) = 32
> 4
> +++ exited with 0 +++
>
> It would be nice if anyone with access to a system where the number of cpus
> is limited in a similar manner to a docker container could run the above
> command and see if it
> 1) returns the correct number of cpus
> 2) works as I think, that is, it counts the number of set bits in the array
> which is the third syscall argument.
>
>
> /Mikael
>
>
>
>>
>> David
>>
>>> /Magnus
>>>
>>>>
>>>> Roger
>>>>
>>>>
>>>> On 12/2/15 6:59 PM, Martin Buchholz wrote:
>>>>>
>>>>> Not to say you shouldn't do this, but I worry that increasingly
>>>>> computing
>>>>> is being done in "containers" where e.g. the number of cpus is doubling
>>>>> every year but only a small number are available to actually be used
>>>>> by a
>>>>> given process.  if availableProcessors reports 1 million, what
>>>>> should we
>>>>> do?  (no need to answer...)
>>>>>
>>>>> On Tue, Dec 1, 2015 at 1:55 AM, Erik Joelsson
>>>>> <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> The current heuristic for figuring out what to default set the -j
>>>>>> flag to
>>>>>> make needs some tweaking.
>>>>>>
>>>>>> In JDK 9, it looks at the amount of memory and the number of cpus in
>>>>>> the
>>>>>> system. It divides memory by 1024 to get a safe number of jobs that
>>>>>> will
>>>>>> fit into memory. The lower of that number and the number of cpus is
>>>>>> then
>>>>>> picked. The number is then scaled down to about 90% of the number of
>>>>>> cpus
>>>>>> to leave some resources for other activities. It is also capped at 16.
>>>>>>
>>>>>> Since we now have the build using "nice" to make sure the build isn't
>>>>>> bogging down the system, I see no reason to do the 90% scaling
>>>>>> anymore.
>>>>>> Also, the performance issues that forced us to cap at 16 have long
>>>>>> been
>>>>>> fixed, and even if we don't scale well beyond 16, we do still scale.
>>>>>> So I
>>>>>> propose we remove that arbitrary limitation too.
>>>>>>
>>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8144312
>>>>>> Webrev: http://cr.openjdk.java.net/~erikj/8144312/webrev.01/
>>>>>>
>>>>>> /Erik
>>>>>>
>>>>
>>>
>>
>

Re: Re: RFR: JDK-8144312: Remove limitations on the default number of jobs in the build

Reply via email to