Thanks for the further info Erik. I guess we will see how this plays out
over time.
Thanks,
David
On 15/11/2018 3:05 am, Erik Joelsson wrote:
On 2018-11-13 20:03, David Holmes wrote:
Hi Erik,
Thanks for all the work you did in trying to stabilize this.
One comment ...
On 14/11/2018 7:34 am, Erik Joelsson wrote:
This patch changes the formula for default test concurrency in
RunTest.gmk. The current formula is:
min(cpus/2, 12)
This seems to work well enough on the x64 machines we currently run
our tests on, but less so for Sparc. I have now run rather extensive
testing in our lab and have come up with a new formula that provides
much better test reliability while preserving as much test throughput
as possible. The new formula is cpus/4 for sparcs with up to 16 cpus
and cpus/5 for larger machines. For non Sparc it's still cpus/2 and
I've removed the cap for all.
I'm surprised that you removed the cap and that it is okay. IIRC we
had problems with large #CPU machines but only medium amounts of RAM.
Too high a concurrency level would result in memory exhaustion.
Dan brought this up too in chat when I first suggested it. I looked
through all available machines in Mach5. There are 2 non SPARC that have
enough CPUs to be affected by the cap and they have 256GB of RAM, which
is plenty. The rest are SPARC and they have at least 1GB of RAM per CPU,
usually more. So with the proposed scheme, I can't see anything really
changing with regards to JOBS vs RAM. My testing did not reveal any such
problems when I scaled down concurrency enough. Also note that the
biggest SPARC we have has 64 CPUs which will now translate into 13 jobs,
which is just 1 more than the previous cap.
We do have a separate issue with a few macs with low RAM compared to
CPUS (4GB and 8CPUs) and I intend to attack that next. My plan is
basically to do something similar to what configure is doing for build
jobs (which is JOBS=min(cpus, RAM in GB)). The exact formula to be
determined. I suspect it's going to involve a constant for the RAM part
to make room for the test harness so something like (RAM - k)/x.
/Erik
Thanks,
David
In addition to this, since Sparc generally have lower per thread
performance, at least when running JDK tests, I have bumped the
default timeout factor from 4 to 8 for Sparc.
With these defaults, we were able to remove a lot of special cases
for Sparc in other parts of our configurations and I was able to get
clean runs of all the lower tiers of testing, on each of our machine
classes in the lab.
In addition to this, the test
compiler/jsr292/ContinuousCallSiteTargetChange.java, which had its
timeout increased in JDK-8212028, no longer needs an increased
timeout with the new defaults.
Bug: https://bugs.openjdk.java.net/browse/JDK-8211727
Webrev: http://cr.openjdk.java.net/~erikj/8211727/webrev.01/
/Erik