Stephen Mallette created TINKERPOP-2569:
-------------------------------------------

             Summary: Reconnect to server if Java driver fails to initialize
                 Key: TINKERPOP-2569
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2569
             Project: TinkerPop
          Issue Type: Bug
          Components: driver
    Affects Versions: 3.4.11
            Reporter: Stephen Mallette


As reported here on SO: 
https://stackoverflow.com/questions/67586427/how-to-recover-with-a-retry-from-gremlin-nohostavailableexception

If the host is unavailable at {{Client}} initialization then the host is not 
put in a state where reconnect is possible. Essentially, this test for 
{{GremlinServerIntegrateTest}} should pass:

{code}
@Test
    public void shouldFailOnInitiallyDeadHost() throws Exception {

        // start test with no server
        this.stopServer();

        final Cluster cluster = TestClientFactory.build().create();
        final Client client = cluster.connect();

        try {
            // try to re-issue a request now that the server is down
            client.submit("g").all().get(3000, TimeUnit.MILLISECONDS);
            fail("Should throw an exception.");
        } catch (RuntimeException re) {
            // Client would have no active connections to the host, hence it 
would encounter a timeout
            // trying to find an alive connection to the host.
            assertThat(re.getCause(), 
instanceOf(NoHostAvailableException.class));

            //
            // should recover when the server comes back
            //

            // restart server
            this.startServer();

            // try a bunch of times to reconnect. on slower systems this may 
simply take longer...looking at you travis
            for (int ix = 1; ix < 11; ix++) {
                // the retry interval is 1 second, wait a bit longer
                TimeUnit.SECONDS.sleep(5);

                try {
                    final List<Result> results = 
client.submit("1+1").all().get(3000, TimeUnit.MILLISECONDS);
                    assertEquals(1, results.size());
                    assertEquals(2, results.get(0).getInt());
                } catch (Exception ex) {
                    if (ix == 10)
                        fail("Should have eventually succeeded");
                }
            }
        } finally {
            cluster.close();
        }
    }
{code}

Note that there is a similar test that first allows a connect to a host and 
then kills it and then restarts it again called {{shouldFailOnDeadHost()}} 
which demonstrates that reconnection works in that situation.

I thought it might be an easy to fix to simply call 
{{considerHostUnavailable()}} in the {{ConnectionPool}} constructor in the 
event of a {{CompletionException}} which should kickstart the reconnect 
process. The reconnects started firing but they all failed for some reason. I 
didn't have time to investigate further than than. 

Currently the only workaround is to recreate the `Client` if this sort of 
situation occurs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to