[ 
https://issues.apache.org/jira/browse/GEODE-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Donal Evans resolved GEODE-9802.
--------------------------------
    Resolution: Fixed

> LoggingWithReconnectDistributedTest uses ephemeral port to create servers, 
> leading to occasional failures with java.net.BindException: Address already 
> in use
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-9802
>                 URL: https://issues.apache.org/jira/browse/GEODE-9802
>             Project: Geode
>          Issue Type: Bug
>    Affects Versions: 1.15.0
>            Reporter: Donal Evans
>            Assignee: Donal Evans
>            Priority: Major
>              Labels: flaky, pull-request-available
>
> Seen originally in distributed mass test run:
> {noformat}
> > Task :geode-core:distributedTest
> LoggingWithReconnectDistributedTest > logFileContainsBannerOnlyOnce FAILED
>     org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.logging.internal.LoggingWithReconnectDistributedTest$$Lambda$547/1860776670.run
>  in VM -1 running on Host 
> heavy-lifter-e58d94dc-0688-534f-8361-75ac377b5300.c.apachegeode-ci.internal 
> with 4 VMs
>         at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:631)
>         at org.apache.geode.test.dunit.VM.invoke(VM.java:448)
>         at 
> org.apache.geode.logging.internal.LoggingWithReconnectDistributedTest.logFileContainsBannerOnlyOnce(LoggingWithReconnectDistributedTest.java:141)
>         Caused by:
>         org.apache.geode.distributed.DistributedSystemDisconnectedException: 
> Reconnect attempts terminated due to exception, caused by 
> org.apache.geode.GemFireIOException: While starting cache server CacheServer 
> on port=46103 client subscription config policy=none client subscription 
> config capacity=1 client subscription config overflow directory=.
>             at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.waitUntilReconnected(InternalDistributedSystem.java:2916)
>             at 
> org.apache.geode.logging.internal.LoggingWithReconnectDistributedTest.lambda$logFileContainsBannerOnlyOnce$bb17a952$2(LoggingWithReconnectDistributedTest.java:147)
>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>             at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>             at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>             at java.lang.reflect.Method.invoke(Method.java:498)
>             at 
> org.apache.geode.test.dunit.internal.MethodInvoker.executeObject(MethodInvoker.java:123)
>             at 
> org.apache.geode.test.dunit.internal.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:78)
>             at 
> org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:628)
>             ... 2 more
>             Caused by:
>             org.apache.geode.GemFireIOException: While starting cache server 
> CacheServer on port=46103 client subscription config policy=none client 
> subscription config capacity=1 client subscription config overflow directory=.
>                 at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.createAndStartCacheServers(InternalDistributedSystem.java:2773)
>                 at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2653)
>                 at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2408)
>                 at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:1254)
>                 at 
> org.apache.geode.distributed.internal.ClusterDistributionManager$DMListener.membershipFailure(ClusterDistributionManager.java:2329)
>                 at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership.uncleanShutdown(GMSMembership.java:1190)
>                 at 
> org.apache.geode.distributed.internal.membership.gms.GMSMembership$ManagerImpl.lambda$uncleanShutdownDS$0(GMSMembership.java:1794)
>                 at java.lang.Thread.run(Thread.java:748)
>                 Caused by:
>                 java.net.BindException: Failed to create server socket on 
> 10.0.0.107[46103]
>                     at 
> org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:75)
>                     at 
> org.apache.geode.internal.net.SCClusterSocketCreator.createServerSocket(SCClusterSocketCreator.java:55)
>                     at 
> org.apache.geode.internal.net.SocketCreator.createServerSocket(SocketCreator.java:524)
>                     at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.<init>(AcceptorImpl.java:573)
>                     at 
> org.apache.geode.internal.cache.tier.sockets.AcceptorBuilder.create(AcceptorBuilder.java:291)
>                     at 
> org.apache.geode.internal.cache.CacheServerImpl.createAcceptor(CacheServerImpl.java:420)
>                     at 
> org.apache.geode.internal.cache.CacheServerImpl.start(CacheServerImpl.java:377)
>                     at 
> org.apache.geode.distributed.internal.InternalDistributedSystem.createAndStartCacheServers(InternalDistributedSystem.java:2769)
>                     ... 7 more
>                     Caused by:
>                     java.net.BindException: Address already in use (Bind 
> failed)
>                         at java.net.PlainSocketImpl.socketBind(Native Method)
>                         at 
> java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
>                         at java.net.ServerSocket.bind(ServerSocket.java:390)
>                         at 
> org.apache.geode.distributed.internal.tcpserver.ClusterSocketCreatorImpl.createServerSocket(ClusterSocketCreatorImpl.java:72)
>                         ... 14 more
> 8334 tests completed, 1 failed, 414 skipped
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=  Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0646/test-results/distributedTest/1636187130/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-mass-test-run/1.15.0-build.0646/test-artifacts/1636187130/distributedtestfiles-openjdk8-1.15.0-build.0646.tgz
> {noformat}
> The createServer method in LoggingWithReconnectDistributedTest uses a port 
> number of 0, which results in an ephemeral port being assigned:
> {noformat}
>   private void createServer(String serverName, File serverDir, int 
> locatorPort) {
>     ServerLauncher.Builder builder = new ServerLauncher.Builder();
>     builder.setMemberName(serverName);
>     builder.setWorkingDirectory(serverDir.getAbsolutePath());
>     builder.setServerPort(0);
>     builder.set(LOCATORS, "localHost[" + locatorPort + "]");
>     builder.set(DISABLE_AUTO_RECONNECT, "false");
>     builder.set(ENABLE_CLUSTER_CONFIGURATION, "false");
>     builder.set(MAX_WAIT_TIME_RECONNECT, "1000");
>     builder.set(MEMBER_TIMEOUT, "2000");
>     serverLauncher = builder.build();
>     serverLauncher.start();
>     system = (InternalDistributedSystem) 
> serverLauncher.getCache().getDistributedSystem();
>   }
> {noformat}
> When the server is restarted, this port may no longer be free, causing the 
> BindException. The test should be changed to use AvailablePortHelper instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to