Thanks for the further info Erik. I guess we will see how this plays out over time.

Thanks,
David

On 15/11/2018 3:05 am, Erik Joelsson wrote:
On 2018-11-13 20:03, David Holmes wrote:
Hi Erik,

Thanks for all the work you did in trying to stabilize this.

One comment ...

On 14/11/2018 7:34 am, Erik Joelsson wrote:
This patch changes the formula for default test concurrency in RunTest.gmk. The current formula is:

min(cpus/2, 12)

This seems to work well enough on the x64 machines we currently run our tests on, but less so for Sparc. I have now run rather extensive testing in our lab and have come up with a new formula that provides much better test reliability while preserving as much test throughput as possible. The new formula is cpus/4 for sparcs with up to 16 cpus and cpus/5 for larger machines. For non Sparc it's still cpus/2 and I've removed the cap for all.

I'm surprised that you removed the cap and that it is okay. IIRC we had problems with large #CPU machines but only medium amounts of RAM. Too high a concurrency level would result in memory exhaustion.

Dan brought this up too in chat when I first suggested it. I looked through all available machines in Mach5. There are 2 non SPARC that have enough CPUs to be affected by the cap and they have 256GB of RAM, which is plenty. The rest are SPARC and they have at least 1GB of RAM per CPU, usually more. So with the proposed scheme, I can't see anything really changing with regards to JOBS vs RAM. My testing did not reveal any such problems when I scaled down concurrency enough. Also note that the biggest SPARC we have has 64 CPUs which will now translate into 13 jobs, which is just 1 more than the previous cap.

We do have a separate issue with a few macs with low RAM compared to CPUS (4GB and 8CPUs) and I intend to attack that next. My plan is basically to do something similar to what configure is doing for build jobs (which is JOBS=min(cpus, RAM in GB)). The exact formula to be determined. I suspect it's going to involve a constant for the RAM part to make room for the test harness so something like (RAM - k)/x.

/Erik

Thanks,
David

In addition to this, since Sparc generally have lower per thread performance, at least when running JDK tests, I have bumped the default timeout factor from 4 to 8 for Sparc.

With these defaults, we were able to remove a lot of special cases for Sparc in other parts of our configurations and I was able to get clean runs of all the lower tiers of testing, on each of our machine classes in the lab.

In addition to this, the test compiler/jsr292/ContinuousCallSiteTargetChange.java, which had its timeout increased in JDK-8212028, no longer needs an increased timeout with the new defaults.

Bug: https://bugs.openjdk.java.net/browse/JDK-8211727

Webrev: http://cr.openjdk.java.net/~erikj/8211727/webrev.01/

/Erik

Reply via email to