Stephen Mallette created TINKERPOP-2569: -------------------------------------------
Summary: Reconnect to server if Java driver fails to initialize Key: TINKERPOP-2569 URL: https://issues.apache.org/jira/browse/TINKERPOP-2569 Project: TinkerPop Issue Type: Bug Components: driver Affects Versions: 3.4.11 Reporter: Stephen Mallette As reported here on SO: https://stackoverflow.com/questions/67586427/how-to-recover-with-a-retry-from-gremlin-nohostavailableexception If the host is unavailable at {{Client}} initialization then the host is not put in a state where reconnect is possible. Essentially, this test for {{GremlinServerIntegrateTest}} should pass: {code} @Test public void shouldFailOnInitiallyDeadHost() throws Exception { // start test with no server this.stopServer(); final Cluster cluster = TestClientFactory.build().create(); final Client client = cluster.connect(); try { // try to re-issue a request now that the server is down client.submit("g").all().get(3000, TimeUnit.MILLISECONDS); fail("Should throw an exception."); } catch (RuntimeException re) { // Client would have no active connections to the host, hence it would encounter a timeout // trying to find an alive connection to the host. assertThat(re.getCause(), instanceOf(NoHostAvailableException.class)); // // should recover when the server comes back // // restart server this.startServer(); // try a bunch of times to reconnect. on slower systems this may simply take longer...looking at you travis for (int ix = 1; ix < 11; ix++) { // the retry interval is 1 second, wait a bit longer TimeUnit.SECONDS.sleep(5); try { final List<Result> results = client.submit("1+1").all().get(3000, TimeUnit.MILLISECONDS); assertEquals(1, results.size()); assertEquals(2, results.get(0).getInt()); } catch (Exception ex) { if (ix == 10) fail("Should have eventually succeeded"); } } } finally { cluster.close(); } } {code} Note that there is a similar test that first allows a connect to a host and then kills it and then restarts it again called {{shouldFailOnDeadHost()}} which demonstrates that reconnection works in that situation. I thought it might be an easy to fix to simply call {{considerHostUnavailable()}} in the {{ConnectionPool}} constructor in the event of a {{CompletionException}} which should kickstart the reconnect process. The reconnects started firing but they all failed for some reason. I didn't have time to investigate further than than. Currently the only workaround is to recreate the `Client` if this sort of situation occurs. -- This message was sent by Atlassian Jira (v8.3.4#803005)