[ https://issues.apache.org/jira/browse/GEODE-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Dodge updated GEODE-4666: --------------------------------- Description: The geode-examples jobs are sometimes failing with port conflicts. Below is an example [https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/TestExamples/builds/24] {noformat} :serialization:start 1. Executing - start locator --name=locator --bind-address=127.0.0.1 ....The Locator process terminated unexpectedly with exit status 1. Please refer to the log file in /tmp/build/ea3e9ea4/geode-examples/serialization/locator for full details. Feb 12, 2018 11:25:19 PM org.apache.geode.distributed.LocatorLauncher failOnStart INFO: locator is exiting due to an exception java.net.BindException: Network is unreachable; port (10334) is not available on localhost. at org.apache.geode.distributed.AbstractLauncher.assertPortAvailable(AbstractLauncher.java:131) at org.apache.geode.distributed.LocatorLauncher.start(LocatorLauncher.java:635) at org.apache.geode.distributed.LocatorLauncher.run(LocatorLauncher.java:549) at org.apache.geode.distributed.LocatorLauncher.main(LocatorLauncher.java:192) Exception in thread "main" java.lang.RuntimeException: An IO error occurred while starting a Locator in /tmp/build/ea3e9ea4/geode-examples/serialization/locator on localhost[10334]: Network is unreachable; port (10334) is not available on localhost. at org.apache.geode.distributed.LocatorLauncher.start(LocatorLauncher.java:655) at org.apache.geode.distributed.LocatorLauncher.run(LocatorLauncher.java:549) at org.apache.geode.distributed.LocatorLauncher.main(LocatorLauncher.java:192) Caused by: java.net.BindException: Network is unreachable; port (10334) is not available on localhost. at org.apache.geode.distributed.AbstractLauncher.assertPortAvailable(AbstractLauncher.java:131) at org.apache.geode.distributed.LocatorLauncher.start(LocatorLauncher.java:635) ... 2 more :serialization:start FAILED {noformat} I did some digging, and I think the cause is that the gfsh shutdown command from a previous test has not finished shutting down the locator. Looking at the code, it looks like there is some code that is intended to shutdown and wait for a certain amount of time. But the logic is flawed, because it is executing a function and not waiting for the results. {code:java} Callable<String> shutdownNodes = () -> { try { Execution execution = FunctionService.onMembers(includeMembers); //****** HERE, execute submits the function asynchronously execution.execute(shutDownFunction); } catch (FunctionException functionEx) { // Expected Exception as the function is shutting down the target members and the result // collector will get member departed exception } return "SUCCESS"; }; Future<String> result = exec.submit(shutdownNodes); result.get(timeout, TimeUnit.MILLISECONDS); {code} was: The geode-examples jobs are sometimes failing with port conflicts. Below is an example https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/TestExamples/builds/24 {noformat} :serialization:start 1. Executing - start locator --name=locator --bind-address=127.0.0.1 ....The Locator process terminated unexpectedly with exit status 1. Please refer to the log file in /tmp/build/ea3e9ea4/geode-examples/serialization/locator for full details. Feb 12, 2018 11:25:19 PM org.apache.geode.distributed.LocatorLauncher failOnStart INFO: locator is exiting due to an exception java.net.BindException: Network is unreachable; port (10334) is not available on localhost. at org.apache.geode.distributed.AbstractLauncher.assertPortAvailable(AbstractLauncher.java:131) at org.apache.geode.distributed.LocatorLauncher.start(LocatorLauncher.java:635) at org.apache.geode.distributed.LocatorLauncher.run(LocatorLauncher.java:549) at org.apache.geode.distributed.LocatorLauncher.main(LocatorLauncher.java:192) Exception in thread "main" java.lang.RuntimeException: An IO error occurred while starting a Locator in /tmp/build/ea3e9ea4/geode-examples/serialization/locator on localhost[10334]: Network is unreachable; port (10334) is not available on localhost. at org.apache.geode.distributed.LocatorLauncher.start(LocatorLauncher.java:655) at org.apache.geode.distributed.LocatorLauncher.run(LocatorLauncher.java:549) at org.apache.geode.distributed.LocatorLauncher.main(LocatorLauncher.java:192) Caused by: java.net.BindException: Network is unreachable; port (10334) is not available on localhost. at org.apache.geode.distributed.AbstractLauncher.assertPortAvailable(AbstractLauncher.java:131) at org.apache.geode.distributed.LocatorLauncher.start(LocatorLauncher.java:635) ... 2 more :serialization:start FAILED {noformat} I did some digging, and I think the cause is that the gfsh shutdown command from a previous test has not finished shutting down the locator. Looking at the code, it looks like there is some code that is intended to shutdown and wait for a certain amount of time. But the logic is flawed, because it is executing a function and not waiting for te results. {code} Callable<String> shutdownNodes = () -> { try { Execution execution = FunctionService.onMembers(includeMembers); //****** HERE, execute submits the function asynchronously execution.execute(shutDownFunction); } catch (FunctionException functionEx) { // Expected Exception as the function is shutting down the target members and the result // collector will get member departed exception } return "SUCCESS"; }; Future<String> result = exec.submit(shutdownNodes); result.get(timeout, TimeUnit.MILLISECONDS); {code} > CI failures in geode examples - Network is unreachable; port (10334) is not > available on localhost > -------------------------------------------------------------------------------------------------- > > Key: GEODE-4666 > URL: https://issues.apache.org/jira/browse/GEODE-4666 > Project: Geode > Issue Type: Bug > Components: gfsh > Reporter: Dan Smith > Priority: Major > > The geode-examples jobs are sometimes failing with port conflicts. Below is > an example > [https://concourse.apachegeode-ci.info/teams/main/pipelines/develop/jobs/TestExamples/builds/24] > {noformat} > :serialization:start > 1. Executing - start locator --name=locator --bind-address=127.0.0.1 > ....The Locator process terminated unexpectedly with exit status 1. Please > refer to the log file in > /tmp/build/ea3e9ea4/geode-examples/serialization/locator for full details. > Feb 12, 2018 11:25:19 PM org.apache.geode.distributed.LocatorLauncher > failOnStart > INFO: locator is exiting due to an exception > java.net.BindException: Network is unreachable; port (10334) is not available > on localhost. > at > org.apache.geode.distributed.AbstractLauncher.assertPortAvailable(AbstractLauncher.java:131) > at > org.apache.geode.distributed.LocatorLauncher.start(LocatorLauncher.java:635) > at > org.apache.geode.distributed.LocatorLauncher.run(LocatorLauncher.java:549) > at > org.apache.geode.distributed.LocatorLauncher.main(LocatorLauncher.java:192) > Exception in thread "main" java.lang.RuntimeException: An IO error occurred > while starting a Locator in > /tmp/build/ea3e9ea4/geode-examples/serialization/locator on localhost[10334]: > Network is unreachable; port (10334) is not available on localhost. > at > org.apache.geode.distributed.LocatorLauncher.start(LocatorLauncher.java:655) > at > org.apache.geode.distributed.LocatorLauncher.run(LocatorLauncher.java:549) > at > org.apache.geode.distributed.LocatorLauncher.main(LocatorLauncher.java:192) > Caused by: java.net.BindException: Network is unreachable; port (10334) is > not available on localhost. > at > org.apache.geode.distributed.AbstractLauncher.assertPortAvailable(AbstractLauncher.java:131) > at > org.apache.geode.distributed.LocatorLauncher.start(LocatorLauncher.java:635) > ... 2 more > :serialization:start FAILED > {noformat} > I did some digging, and I think the cause is that the gfsh shutdown command > from a previous test has not finished shutting down the locator. > Looking at the code, it looks like there is some code that is intended to > shutdown and wait for a certain amount of time. But the logic is flawed, > because it is executing a function and not waiting for the results. > {code:java} > Callable<String> shutdownNodes = () -> { > try { > Execution execution = FunctionService.onMembers(includeMembers); > > //****** HERE, execute submits the function asynchronously > execution.execute(shutDownFunction); > } catch (FunctionException functionEx) { > // Expected Exception as the function is shutting down the target > members and the result > // collector will get member departed exception > } > return "SUCCESS"; > }; > Future<String> result = exec.submit(shutdownNodes); > result.get(timeout, TimeUnit.MILLISECONDS); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)