I recently stabilized my plugin's test suite on ci.jenkins.io. The following is my root cause analysis.
At present there are eight online Ubuntu EC2 agents on ci.jenkins.io. Three of these are high memory and five of these are not: • EC2 (aws) - High memory ubuntu 18.04 (i-067cdb5c4dd6bbc66) • EC2 (aws) - High memory ubuntu 18.04 (i-09868363dd8e0e302) • EC2 (aws) - High memory ubuntu 18.04 (i-0d3e670dcf9448827) • EC2 (aws) - Ubuntu 18.04 LTS (i-0147db496a4c3205b) • EC2 (aws) - Ubuntu 18.04 LTS (i-066509d2e6e564444) • EC2 (aws) - Ubuntu 18.04 LTS (i-06b6dd7739f0fcad8) • EC2 (aws) - Ubuntu 18.04 LTS (i-0c6752517c9e4dd86) • EC2 (aws) - Ubuntu 18.04 LTS (i-0d7ea29c5c4d607c6) Both the high memory and the regular memory agents have the "linux" label, so the Linux branches of my plugin's tests may run on either the high memory or the regular memory agents. I noticed that the branches of my tests that happen to run on the high memory agents usually pass, but the branches of my tests that happen to run on the regular memory agents frequently time out. I added additional logging and saw that the agent JVM being launched by my tests was sometimes running out of memory and crashing. This in turn was causing my test to time out waiting for the agent to connect. Why was the agent JVM running out of memory? I added additional logging to print memory usage by process during each test. I discovered that the regular memory agents have 2 GB of RAM. They run several JVMs in the course of a typical integration test: • Remoting (with no -Xmx or -Xms) • Maven (with no -Xmx or -Xms) • surefire (with -Xms768M -Xmx768M) • The agent JVM launched by my tests (with no -Xmx or -Xms) I added additional logging and determined that at the time my test started (at which point the only JVMs running were Remoting, Maven, and surefire), only about 400 MB of RAM remained free on the system. Thus it was no surprise that my agent JVMs were frequently running out of memory. I worked around the problem by setting <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <configuration> <argLine>-Xmx256m -Xms256m</argLine> </configuration> </plugin> in pom.xml and setting "-Xmx64m -Xms64m" for my agent JVMs (in my tests). With these settings my tests consistently pass, even on the regular memory EC2 agents. I suggest the Jenkins infrastructure team consider adding -Xmx and -Xms options to the Remoting JVM and/or using EC2 instance types with more memory. -- You received this message because you are subscribed to the Google Groups "Jenkins Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAFwNDjq3HKsXiO-%2BBjgKgn1fjxSaJApQGUf2HyRwW2jM28p4Jw%40mail.gmail.com.