I recently stabilized my plugin's test suite on ci.jenkins.io. The
following is my root cause analysis.

At present there are eight online Ubuntu EC2 agents on ci.jenkins.io.
Three of these are high memory and five of these are not:

• EC2 (aws) - High memory ubuntu 18.04 (i-067cdb5c4dd6bbc66)
• EC2 (aws) - High memory ubuntu 18.04 (i-09868363dd8e0e302)
• EC2 (aws) - High memory ubuntu 18.04 (i-0d3e670dcf9448827)
• EC2 (aws) - Ubuntu 18.04 LTS (i-0147db496a4c3205b)
• EC2 (aws) - Ubuntu 18.04 LTS (i-066509d2e6e564444)
• EC2 (aws) - Ubuntu 18.04 LTS (i-06b6dd7739f0fcad8)
• EC2 (aws) - Ubuntu 18.04 LTS (i-0c6752517c9e4dd86)
• EC2 (aws) - Ubuntu 18.04 LTS (i-0d7ea29c5c4d607c6)

Both the high memory and the regular memory agents have the "linux"
label, so the Linux branches of my plugin's tests may run on either
the high memory or the regular memory agents. I noticed that the
branches of my tests that happen to run on the high memory agents
usually pass, but the branches of my tests that happen to run on the
regular memory agents frequently time out.

I added additional logging and saw that the agent JVM being launched
by my tests was sometimes running out of memory and crashing. This in
turn was causing my test to time out waiting for the agent to connect.
Why was the agent JVM running out of memory?

I added additional logging to print memory usage by process during
each test. I discovered that the regular memory agents have 2 GB of
RAM. They run several JVMs in the course of a typical integration
test:

• Remoting (with no -Xmx or -Xms)
• Maven (with no -Xmx or -Xms)
• surefire (with -Xms768M -Xmx768M)
• The agent JVM launched by my tests (with no -Xmx or -Xms)

I added additional logging and determined that at the time my test
started (at which point the only JVMs running were Remoting, Maven,
and surefire), only about 400 MB of RAM remained free on the system.
Thus it was no surprise that my agent JVMs were frequently running out
of memory.

I worked around the problem by setting

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-surefire-plugin</artifactId>
  <configuration>
    <argLine>-Xmx256m -Xms256m</argLine>
  </configuration>
</plugin>

in pom.xml and setting "-Xmx64m -Xms64m" for my agent JVMs (in my
tests). With these settings my tests consistently pass, even on the
regular memory EC2 agents.

I suggest the Jenkins infrastructure team consider adding -Xmx and
-Xms options to the Remoting JVM and/or using EC2 instance types with
more memory.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-dev/CAFwNDjq3HKsXiO-%2BBjgKgn1fjxSaJApQGUf2HyRwW2jM28p4Jw%40mail.gmail.com.

Reply via email to