[jira] [Commented] (TINKERPOP-2569) Reconnect to server if Java driver fails to initialize

ASF GitHub Bot (Jira) Thu, 07 Oct 2021 04:46:19 -0700


    [ 
https://issues.apache.org/jira/browse/TINKERPOP-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425503#comment-17425503
 ]


ASF GitHub Bot commented on TINKERPOP-2569:
-------------------------------------------

spmallette commented on pull request #1476:
URL: https://github.com/apache/tinkerpop/pull/1476#issuecomment-937714100


   @xiazcy i'm sorry but i think i asked you change the behavior of this test 
for the Gremlin Console 
   
   ```java
       @Test
       public void shouldConnect() throws Exception {
           // there is no gremlin server running for this test, but 
gremlin-driver lazily connects so this should
           // be ok to just validate that a connection is created
           
assertThat(acceptor.connect(Arrays.asList(Storage.toPath(TestHelper.generateTempFileFromResource(this.getClass(),
 "remote.yaml", ".tmp")))).toString(),
                      startsWith("Configured "));
       }
   ```
   
   and perhaps that was a mistake to ask you to do that. As it currently sits, 
we get this behavior now:
   
   ```text
   gremlin> :remote connect tinkerpop.server conf/remote.yaml
   ERROR org.apache.tinkerpop.gremlin.driver.Client  - Could not initialize 
client for Host{address=localhost/127.0.0.1:8182, 
hostUri=ws://localhost:8182/gremlin}
   ERROR org.apache.tinkerpop.gremlin.driver.Client  - 
   java.net.ConnectException: Connection refused
        at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
        at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
        at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.base/java.lang.Thread.run(Thread.java:829)
   ==>Error during 'connect' - java.net.ConnectException: Connection refused
   ```
   
   Not only do we get the big stacktrace, behind the scenes but the `Cluster` 
object is initialized and would be trying to reconnect to the server when it 
comes online. I think that the Gremlin Console could handle this issue a bit 
better by catching the `NoHostAvailableException` explicitly and reporting a 
more user friendly message, like "Configured <cluster> <session> but host not 
presently available". At that point we could change that test back to what it 
was and the behavior would be the same as before but with a far nicer message.
   
   Could you please submit a PR for that to complete this issue? You could 
submit the work under the same JIRA ticket. 
   
   Some additional details:
   
   1. The exception is thrown 
[here](https://github.com/apache/tinkerpop/blob/3.4-dev/gremlin-console/src/main/java/org/apache/tinkerpop/gremlin/console/jsr223/DriverRemoteAcceptor.java#L111)
 so you would just add the catch for `NoHostAvailableException` there.
   2. Rather than throw an exception in that catch you would `return` the nice 
error message
   3. A bit of a side issue that I noticed while looking at this was the code 
path where an exception is propagated from that code in the link above and the 
`RemoteCommand` handles it 
[here](https://github.com/apache/tinkerpop/blob/3.4-dev/gremlin-console/src/main/groovy/org/apache/tinkerpop/gremlin/console/commands/RemoteCommand.groovy#L55).
 I feel like something isn't quite right there because a `remote` will never 
have `close()` called on it to release the `Cluster` object. So maybe there is 
a leak there? Perhaps you could take a minute to analyze that and see if a 
little fix is necessary there?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Reconnect to server if Java driver fails to initialize
> ------------------------------------------------------
>
>                 Key: TINKERPOP-2569
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2569
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: driver
>    Affects Versions: 3.4.11
>            Reporter: Stephen Mallette
>            Priority: Minor
>
> As reported here on SO: 
> https://stackoverflow.com/questions/67586427/how-to-recover-with-a-retry-from-gremlin-nohostavailableexception
> If the host is unavailable at {{Client}} initialization then the host is not 
> put in a state where reconnect is possible. Essentially, this test for 
> {{GremlinServerIntegrateTest}} should pass:
> {code}
> @Test
>     public void shouldFailOnInitiallyDeadHost() throws Exception {
>         // start test with no server
>         this.stopServer();
>         final Cluster cluster = TestClientFactory.build().create();
>         final Client client = cluster.connect();
>         try {
>             // try to re-issue a request now that the server is down
>             client.submit("g").all().get(3000, TimeUnit.MILLISECONDS);
>             fail("Should throw an exception.");
>         } catch (RuntimeException re) {
>             // Client would have no active connections to the host, hence it 
> would encounter a timeout
>             // trying to find an alive connection to the host.
>             assertThat(re.getCause(), 
> instanceOf(NoHostAvailableException.class));
>             //
>             // should recover when the server comes back
>             //
>             // restart server
>             this.startServer();
>             // try a bunch of times to reconnect. on slower systems this may 
> simply take longer...looking at you travis
>             for (int ix = 1; ix < 11; ix++) {
>                 // the retry interval is 1 second, wait a bit longer
>                 TimeUnit.SECONDS.sleep(5);
>                 try {
>                     final List<Result> results = 
> client.submit("1+1").all().get(3000, TimeUnit.MILLISECONDS);
>                     assertEquals(1, results.size());
>                     assertEquals(2, results.get(0).getInt());
>                 } catch (Exception ex) {
>                     if (ix == 10)
>                         fail("Should have eventually succeeded");
>                 }
>             }
>         } finally {
>             cluster.close();
>         }
>     }
> {code}
> Note that there is a similar test that first allows a connect to a host and 
> then kills it and then restarts it again called {{shouldFailOnDeadHost()}} 
> which demonstrates that reconnection works in that situation.
> I thought it might be an easy to fix to simply call 
> {{considerHostUnavailable()}} in the {{ConnectionPool}} constructor in the 
> event of a {{CompletionException}} which should kickstart the reconnect 
> process. The reconnects started firing but they all failed for some reason. I 
> didn't have time to investigate further than than. 
> Currently the only workaround is to recreate the `Client` if this sort of 
> situation occurs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (TINKERPOP-2569) Reconnect to server if Java driver fails to initialize

Reply via email to