[jira] [Commented] (SUREFIRE-1719) Race condition results in "VM crash or System.exit called?" failure

Tibor Digana (Jira) Fri, 03 Apr 2020 11:59:07 -0700


    [ 
https://issues.apache.org/jira/browse/SUREFIRE-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074820#comment-17074820
 ]


Tibor Digana commented on SUREFIRE-1719:
----------------------------------------

[~tigran]
99% of these logs are useless. You can simply call {{System.exit(0)}} in your 
test and you get the same logs many other people have on this planet with 
totally different root cause.
So you should checkout the 
[commit|https://issues.apache.org/jira/browse/SUREFIRE-1719?focusedCommentId=17004616&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17004616]
 and build the plugin locally using {{mvn install -DskipTests}} and run your 
tests in your project in offline mode {{mvn -o test}}.
Again you may have error but root cause is maybe different from what 
[~paulmillar] had. How can you prove that you have his root cause?

> Race condition results in "VM crash or System.exit called?" failure
> -------------------------------------------------------------------
>
>                 Key: SUREFIRE-1719
>                 URL: https://issues.apache.org/jira/browse/SUREFIRE-1719
>             Project: Maven Surefire
>          Issue Type: Bug
>          Components: Maven Surefire Plugin
>    Affects Versions: 2.20, 2.20.1, 2.21.0, 2.22.0, 2.22.1, 2.22.2, 3.0.0-M2, 
> 3.0.0-M1, 3.0.0-M3
>            Reporter: Paul Millar
>            Assignee: Tibor Digana
>            Priority: Major
>             Fix For: 3.0.0-M5
>
>         Attachments: build-error-debug.out, build.out, pom.xml
>
>
> After upgrading surefire in our project (dCache) from 2.19.1 to 3.0.0-M3, 
> unit tests started to fail with the message "ExecutionException The forked VM 
> terminated without properly saying goodbye. VM crash or System.exit called?"
> For reference, the command I am using to verify this problem is "mvn -am -pl 
> modules/common clean package" and the surefire configuration is:
> {{<plugin>}}
> {{  <groupId>org.apache.maven.plugins</groupId>}}
> {{  <artifactId>maven-surefire-plugin</artifactId>}}
> {{  <configuration>}}
> {{    <includes>}}
> {{      <include>**/*Test.class</include>}}
> {{      <include>**/*Tests.class</include>}}
> {{    </includes>}}
> {{    <!-- dCache uses the singleton anti-pattern in way}}
> {{    too many places. That unfortunately means we have}}
> {{    to accept the overhead of forking each test run. -->}}
> {{    <forkCount>1C</forkCount>}}
> {{    <reuseForks>false</reuseForks>}}
> {{  </configuration>}}
> {{ </plugin>}}
> [The complete pom.xml is attached.]
> This problem is not always present. On our build machine, I've seen the 
> problem appear 6 out of 10 times when running the above mvn command. There is 
> (apparently) little that seems to influence whether the build will succeed or 
> fail.
> [I've attached the complete output from running the above mvn command, both 
> the normal output and including the -e -X options.]
> The problem seems to appear only on machines with a "large" number of cores. 
> Our build machine has 24 cores, and I've seen a report of a similar problem 
> where building dCache on a 48 core machine. On the other side, I have been 
> unable to reproduce the problem with my desktop machine (8 core) or on my 
> laptop (4 cores).
> What seems to matter is the number of actually running JVM instances.
> I have not been able to reproduce the problem by increasing the forkCount on 
> a machine with a small number of cores. However, I've noticed that, on an 8 
> core machine, increasing the forkCount does not actually result in that many 
> more JVM instances running.
> Similarly, experience shows that reducing the number of concurrent JVM 
> instances "fixes" the problem. A forkCount of 6 seems to bring the likelihood 
> of a problem below 10% (0 failures with 10 builds) on our build machine.  On 
> this machine, the default configuration would try to run 24 JVM instances 
> concurrently (forkCount of "1C" on a 24 core machine).
> The problem appears to have been introduced in surefire v2.20. When building 
> with surefire v2.19.1, the above mvn command is always successful on our 
> build machine.  Building with surefire v2.20 results in intermittent failures 
> (~60% failure rate).
> Using git bisection (and with the criterion for "good" as zero failures in 10 
> build attempts), I was able to determine that commit da7ff6aa2 "SUREFIRE-1342 
> Acknowledge normal exit of JVM and drain shared memory between processes" is 
> the first commit where surefire has this intermittent failure behaviour.
> From a causal scan through the patch, my guess is that the BYE_ACK support it 
> introduces is somehow racy (for example, reading or updating a field-member 
> outside of a monitor) and problems are triggered if there are a large number 
> of JVMs exiting concurrently.  So, with increased number of concurrent JVMs 
> there is an increased risk of a thread loosing the race, and so triggering 
> this error.
> Such a problem would be consistent with observed behaviour.  However, I don't 
> have any strong evidence that this is what is happening.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (SUREFIRE-1719) Race condition results in "VM crash or System.exit called?" failure

Reply via email to