[ 
https://issues.apache.org/jira/browse/TINKERPOP-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421691#comment-17421691
 ] 

ASF GitHub Bot commented on TINKERPOP-2569:
-------------------------------------------

xiazcy commented on pull request #1478:
URL: https://github.com/apache/tinkerpop/pull/1478#issuecomment-928255527


   Hi Divij, thank you so much for your suggestions!
   
   Just as a response to the two cases you outlined, and to make sure we are on 
the same page, re-init() is actually triggered when 
`Client.noLiveHostAvailable` is set to `true`, and not `false`. This flag is 
set to `true` at beginning of `Client.initializeImplementation()`, and is set 
to `false` if at least one host is alive during initialization, not the other 
way around. This follows an "assume no live hosts unless proven otherwise" 
logic. 
   
   More specifically, in the case when there are dead hosts among other live 
hosts, `Client.noLiveHostAvailable` is first set to `true`, and then set to 
`false` when any live hosts successfully initialized a connection pool inside 
the async call to `Client.initializeConnectionSetupForHost`. This also leaves 
`Client.initialized` as `true`, which shouldn’t trigger `Client.init()` at `if 
(!initialized || noLiveHostAvailable)` inside `Client.submitAsync()`, and if 
`Client.init()` is somehow called, its `if (initialized && 
!noLiveHostAvailable)` check would be `true` and any further initialization 
would be skipped. I believe this should avoid Case#2.
   
   But in any case, this change doesn’t resolve the fact that 
`Client.initialized` is set as `true` when by definition it shouldn’t when no 
host is available, and I’m working on a way to approach this based on your 
suggestions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reconnect to server if Java driver fails to initialize
> ------------------------------------------------------
>
>                 Key: TINKERPOP-2569
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2569
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: driver
>    Affects Versions: 3.4.11
>            Reporter: Stephen Mallette
>            Priority: Minor
>
> As reported here on SO: 
> https://stackoverflow.com/questions/67586427/how-to-recover-with-a-retry-from-gremlin-nohostavailableexception
> If the host is unavailable at {{Client}} initialization then the host is not 
> put in a state where reconnect is possible. Essentially, this test for 
> {{GremlinServerIntegrateTest}} should pass:
> {code}
> @Test
>     public void shouldFailOnInitiallyDeadHost() throws Exception {
>         // start test with no server
>         this.stopServer();
>         final Cluster cluster = TestClientFactory.build().create();
>         final Client client = cluster.connect();
>         try {
>             // try to re-issue a request now that the server is down
>             client.submit("g").all().get(3000, TimeUnit.MILLISECONDS);
>             fail("Should throw an exception.");
>         } catch (RuntimeException re) {
>             // Client would have no active connections to the host, hence it 
> would encounter a timeout
>             // trying to find an alive connection to the host.
>             assertThat(re.getCause(), 
> instanceOf(NoHostAvailableException.class));
>             //
>             // should recover when the server comes back
>             //
>             // restart server
>             this.startServer();
>             // try a bunch of times to reconnect. on slower systems this may 
> simply take longer...looking at you travis
>             for (int ix = 1; ix < 11; ix++) {
>                 // the retry interval is 1 second, wait a bit longer
>                 TimeUnit.SECONDS.sleep(5);
>                 try {
>                     final List<Result> results = 
> client.submit("1+1").all().get(3000, TimeUnit.MILLISECONDS);
>                     assertEquals(1, results.size());
>                     assertEquals(2, results.get(0).getInt());
>                 } catch (Exception ex) {
>                     if (ix == 10)
>                         fail("Should have eventually succeeded");
>                 }
>             }
>         } finally {
>             cluster.close();
>         }
>     }
> {code}
> Note that there is a similar test that first allows a connect to a host and 
> then kills it and then restarts it again called {{shouldFailOnDeadHost()}} 
> which demonstrates that reconnection works in that situation.
> I thought it might be an easy to fix to simply call 
> {{considerHostUnavailable()}} in the {{ConnectionPool}} constructor in the 
> event of a {{CompletionException}} which should kickstart the reconnect 
> process. The reconnects started firing but they all failed for some reason. I 
> didn't have time to investigate further than than. 
> Currently the only workaround is to recreate the `Client` if this sort of 
> situation occurs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to