Re: Java 11 tests are very flaky on GitHub Actions

Diego Salvi - Diennea Mon, 08 Feb 2021 09:18:39 -0800

Hi Stig,

Yes you can receive from maven the fork instance and the maximum fork count and 
then using such values to compute custom ports to run parallel tests.
The boring part is to use configure every test to use computed ports and not 
default one.

You have to configure surefire to setup some system properties:

<properties>
    <forkCount>4</forkCount>
</properties>

<plugins>
  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-surefire-plugin</artifactId>
    <version>2.21.0</version>
    <configuration>
      <forkCount>${forkCount}</forkCount>
      <runOrder>balanced</runOrder>
      <systemPropertyVariables>
        <!-- Due to a surefire bug we cannot get directly the fork number but 
we need a prefix (“current:”) or it will be null -->
        <my.fork.num>current:${surefire.forkNumber}</my.fork.num>
        <my.fork.max>${forkCount}</my.fork.max>
      </systemPropertyVariables>
    </configuration>
  </plugin>
</plugins>

Then you can create a base test class o a test rule to read such system 
properties and evaluate a custom port when requested.
I gone for the base class but the test rule is possible too:

public abstract class BaseTest {

    public static final String NUM_PROPERTY_PREFIX = "current:";

    /**
     * Fork "position" in [0,len)
     */
    protected final int pos;

    /**
     * Max fork count
     */
    protected final int len;

    public BaseTestUtils() {
        System.out.println("[TESTENV] my.fork.num: " + 
System.getProperty("my.fork.num"));
        System.out.println("[TESTENV] my.fork.max: " + 
System.getProperty("my.fork.max"));

        final String num = System.getProperty("my.fork.num", 
NUM_PROPERTY_PREFIX + "1");

        pos = Integer.valueOf(num.substring(NUM_PROPERTY_PREFIX.length(), 
num.length())) - 1;
        len = Integer.getInteger("my.fork.max", 1);

    }

    public int evaluatePort(int base) {
        return evaluatePort(pos, len, base);
    }

    private static int evaluatePort(int pos, int len, int base) {

        /* max: 65535, we use just first 16 bits */

        if (len <= pos || len <= 0 || pos < 0) {
            throw new IllegalArgumentException("Len or pos invalid");
        }

        int max = len - 1;
        int prefixBitsNeeded = Integer.SIZE - Integer.numberOfLeadingZeros(max);

        int bitsRemaining = 16 - prefixBitsNeeded;

        int suffixBitNeeded = Integer.SIZE - Integer.numberOfLeadingZeros(base);

        if (suffixBitNeeded > bitsRemaining) {
            throw new IllegalArgumentException("Base too large: pos " + pos + " 
len " + len + " base " + base
                    + "; needed bits " + suffixBitNeeded + " remaining " + 
bitsRemaining);
        }

        int port = (pos << bitsRemaining) + base;

        return port;
    }

}

In EmailSuccess we use a tricky “shift” logic to compute a new port but smarter 
or simpler ones could be written too (something like port + pos * 5000).

Diego Salvi

Da: Enrico Olivelli <[email protected]>
Data: lunedì 8 febbraio 2021 11:49
A: DevZooKeeper <[email protected]>
Cc: Diego Salvi - Diennea <[email protected]>
Oggetto: Re: Java 11 tests are very flaky on GitHub Actions

Stig,
thanks for your information.

Unfortunately this is not doable in our test suite, because we have to create 
server side configuration for an ensemble of servers before starting them.

I am afraid that in order to implement a safe port-assignment we will have to 
instrument all of the code that is opening server side sockets in order to use 
mocks or pre-created sockets.
But this is not easy because we sometimes have tests that close and reopen the 
ServerSocket, but doing so we will "lose" the port.

I saw in other projects (non OSS, so I cannot send pointers) that you can 
receive from the Maven environment the if of the "forked JVM" and then you can 
compute all of the ports used by the tests from that value.
For instance if you start 4 parallel tests you will have that JVM id spanning 
from 0 to 3 and you receive that value as a System Property.
Then the test can use that I in order to compute static ports, like JVMID * 
1000 + X .

I am cc'ing one of my former colleagues at EmailSuccess.com, Diego, that 
probably can share his experience in EmailSuccess

Enrico

Il giorno sab 6 feb 2021 alle ore 16:32 Stig Rohde Døssing 
<[email protected]<mailto:[email protected]>> ha scritto:
I'm not sure if this is easy to solve for Zookeeper, but going to make a
suggestion here I've seen work on other projects.

The "find a free port" code is inherently racy. I've seen that solution
used in many other projects, and it never works that well.

A much safer way to do it is to pass port 0 directly to the server you're
trying to start. As you know from the port reservation code, this will
cause the OS to use a free port. The port selected by the OS can be
retrieved by the test by calling e.g.
https://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#getLocalPort().

You go from this flow
Open ServerSocket on 0
Get port via ServerSocket.getLocalPort
Close socket <- This is the dangerous bit, as after this, the port is again
up for grabs for other tests to acquire
Pass chosen port to server
Server opens ServerSocket on port

To
Pass 0 to server
Server opens ServerSocket on 0, getting a random port
Test calls some method on the server that ends up in a call to
ServerSocket.getLocalPort

This won't work for cases where you're testing e.g. the client starting
before the server, and connecting once the server has booted, but for the
common use case of starting a server, and then starting a client connected
to that server, this should work.

On 2021/02/05 15:58:00, Christopher <[email protected]<mailto:[email protected]>> 
wrote:
> FWIW, the maven-surefire-plugin should be able to retry temporarily>
> failing tests, but it isn't working with JUnit5 until the plugin is>
> updated to a newer version. However, the newer version of the plugin>
> didn't work with the versions of JUnit that ZK is using. When I tried>
> to update maven-surefire-plugin, things got worse due to JUnit4/JUnit5>
> stuff that I don't understand.>
>
> I have created a PR to trigger the tests to run on JDK8 instead of>
> JDK11, to demonstrate that the tests are flaky there (or to prove me>
> wrong), but it doesn't need to be merged, as it's just a test:>
> https://github.com/apache/zookeeper/pull/1595>
>
> On Fri, Feb 5, 2021 at 10:48 AM Christopher 
> <[email protected]<mailto:[email protected]>> wrote:>
> >>
> > These tests are flaky on JDK8, too, when I tried. It's my>
> > understanding that's why they were not being run on Travis previously>
> > (still aren't). Most of the tests that I see failing are due to the>
> > "Address already in use" bind error. This may be more likely in a>
> > virtualized environment like GitHub Actions (vs. non-vritualized>
> > Jenkins), but I don't think it is unique to that environment... just>
> > maybe more likely for whatever reason.>
> >>
> > I have spent quite a bit of time looking into this, and I think the>
> > main cause is that the port reservation stuff isn't working the way it>
> > should. It tries to bind to a port, and then it closes the>
> > ServerSocket that was used to find an available port. What it should>
> > do instead is return the ServerSocket itself for use in the calling>
> > code, after it has successfully bound, rather than returning an>
> > integer. But, that might be a big change, and there might be a simpler>
> > fix.>
> >>
> > On Fri, Feb 5, 2021 at 10:25 AM Enrico Olivelli 
> > <[email protected]<mailto:[email protected]>>
wrote:>
> > >>
> > > Hi,>
> > > I see that the new test workflow with Java 11 is very flaky.>
> > >>
> > > This is an example>
> > >
https://github.com/apache/zookeeper/pull/1592/checks?check_run_id=1830428694>

> > >>
> > > I would like to not consider it blocker for merging pull requests.>
> > >>
> > > That said, we should investigate further, the tests that are failing
were>
> > > not flaky on JDK8>
> > >>
> > > Thoughts ?>
> > > Enrico>
>

________________________________

CONFIDENTIALITY & PRIVACY NOTICE
This e-mail (including any attachments) is strictly confidential and may also 
contain privileged information. If you are not the intended recipient you are 
not authorised to read, print, save, process or disclose this message. If you 
have received this message by mistake, please inform the sender immediately and 
destroy this e-mail, its attachments and any copies. Any use, distribution, 
reproduction or disclosure by any person other than the intended recipient is 
strictly prohibited and the person responsible may incur in penalties.
The use of this e-mail is only for professional purposes; there is no guarantee 
that the correspondence towards this e-mail will be read only by the recipient, 
because, under certain circumstances, there may be a need to access this email 
by third subjects belonging to the Company.

Re: Java 11 tests are very flaky on GitHub Actions

Reply via email to