[ 
https://issues.apache.org/jira/browse/TINKERPOP-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643533#comment-17643533
 ] 

ASF GitHub Bot commented on TINKERPOP-2813:
-------------------------------------------

vkagamlyk commented on code in PR #1882:
URL: https://github.com/apache/tinkerpop/pull/1882#discussion_r1040027645


##########
gremlin-driver/src/main/java/org/apache/tinkerpop/gremlin/driver/Client.java:
##########
@@ -495,49 +496,45 @@ public Client alias(final Map<String, String> aliases) {
         protected Connection chooseConnection(final RequestMessage msg) throws 
TimeoutException, ConnectionException {
             final Iterator<Host> possibleHosts;
             if (msg.optionalArgs(Tokens.ARGS_HOST).isPresent()) {
-                // TODO: not sure what should be done if unavailable - select 
new host and re-submit traversal?
+                // looking at this code about putting the Host on the 
RequestMessage in light of 3.5.4, not sure
+                // this is being used as intended here. server side usage is 
to place the channel.remoteAddress
+                // in this token in the status metadata for the response. 
can't remember why it is being used this
+                // way here exactly. created TINKERPOP-2821 to examine this 
more carefully to clean this up in a
+                // future version.
                 final Host host = (Host) msg.getArgs().get(Tokens.ARGS_HOST);
                 msg.getArgs().remove(Tokens.ARGS_HOST);
                 possibleHosts = IteratorUtils.of(host);
             } else {
                 possibleHosts = 
this.cluster.loadBalancingStrategy().select(msg);
             }
 
-            // you can get no possible hosts in more than a few situations. 
perhaps the servers are just all down.
-            // or perhaps the client is not configured properly (disables ssl 
when ssl is enabled on the server).
-            if (!possibleHosts.hasNext())
-                throwNoHostAvailableException();
-
-            final Host bestHost = possibleHosts.next();
+            // try a random host if none are marked available. maybe it will 
reconnect in the meantime. better than
+            // going straight to a fast NoHostAvailableException as was the 
case in versions 3.5.4 and earlier
+            final Host bestHost = possibleHosts.hasNext() ? 
possibleHosts.next() : chooseRandomHost();

Review Comment:
   For example we have a cluster of 3 instances.
   For some reason (reconfiguration, scaling, network fluke) all instances 
restarted.
   After some time can we connect to one random host and use it, but all the 
others will remain disconnected.
   Maybe it makes sense to check all unavailable hosts when the user sends a 
request, but not more than once every 10-60 seconds?





> Improve driver usability for cases where NoHostAvailableException is 
> currently thrown
> -------------------------------------------------------------------------------------
>
>                 Key: TINKERPOP-2813
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2813
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: driver
>    Affects Versions: 3.5.4
>            Reporter: Stephen Mallette
>            Assignee: Stephen Mallette
>            Priority: Blocker
>
> A {{NoHostAvailableException}} occurs in two cases:
> 1. where the {{Client}} is initialized and a failure occurs on all {{Host}} 
> instances configured
> 2. when the {{Client}} attempts to {{chooseConnection()}} to send a request 
> and all {{Host}} instances configured are marked unavailable.
> In the first case, you can get a cause for the failure which is helpful, but 
> the inadequacy is that you only get the failure of the first {{Host}} to 
> cause a problem. The second case is a bit worse because there you get no 
> cause in the exception and it's a "fast fail" in that as soon as the request 
> is sent there is no pause to see if the {{Host}} comes back online. Moreover, 
> a {{Host}} can be marked for failure for the infraction of just a single 
> {{Connection}} that may have just encountered a intermittent network issue, 
> thus quite quickly killing the entire {{ConnectionPool}} and turning 100s or 
> requests per second into 100s of {{NoHostAvailableException}} per second. 
> Note that you can also get an infraction for the pool just being overloaded 
> with requests which may signal that either the pool or server not being sized 
> right for the current workload - in either case, the 
> {{NoHostAvailableException}} is a bit of a harsh way to deal with that and in 
> any event doesn't quite give the user clues as to how to deal with it.
> All in all, this situation makes {{NoHostAvailableException}} hard to debug. 
> This ticket is meant to help smooth some of these problems. Initial thoughts 
> for improvements include better logging, ensuring that 
> {{NoHostAvailableException}} is not thrown without a cause, preferring more 
> specific exceptions in the fist place to {{NoHostAvailableException}}, 
> getting rid of "fast fails" in favor of longer pauses to see if a host can 
> recover and taking a softer stance on when a {{Host}} is actually considered 
> "unavailable".
> Expecting to implement this without breaking API changes, though exceptions 
> may shift around a bit, but will try to keep those to a minimum.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to