[ 
https://issues.apache.org/jira/browse/GEODE-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456072#comment-17456072
 ] 

ASF subversion and git services commented on GEODE-9808:
--------------------------------------------------------

Commit 1e66771a546462e89b6e11aaef294fb0e05d524c in geode's branch 
refs/heads/develop from Donal Evans
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=1e66771 ]

GEODE-9808: Throw appropriate exception in AutoConnectionSourceImpl (#7143)

 - Throw NoServersFoundException instead of NoLocatorsFoundException in 
AutoConnectionSourceImpl if queryLocators() returns a response with no result 
 - Refactor and fix up AutoConnectionSourceImplJUnitTest
 - Modify tests in AutoConnectionSourceImplJUnitTest to cover new
 behaviour

Authored-by: Donal Evans <doev...@vmware.com>

> Client ops fail with NoLocatorsAvailableException when all servers leave the 
> DS 
> --------------------------------------------------------------------------------
>
>                 Key: GEODE-9808
>                 URL: https://issues.apache.org/jira/browse/GEODE-9808
>             Project: Geode
>          Issue Type: Bug
>          Components: client/server
>    Affects Versions: 1.15.0
>            Reporter: Bill Burcham
>            Assignee: Donal Evans
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.15.0
>
>
> When there are no cache servers (only locators) in a cluster, client 
> operations will fail with a misleading exception:
> {noformat}
> org.apache.geode.cache.client.NoAvailableLocatorsException: Unable to connect 
> to any locators in the list 
> [gemfire-cluster-locator-0.gemfire-cluster-locator.namespace-1850250019.svc.cluster.local:10334,
>  
> gemfire-cluster-locator-1.gemfire-cluster-locator.namespace-1850250019.svc.cluster.local:10334,
>  
> gemfire-cluster-locator-2.gemfire-cluster-locator.namespace-1850250019.svc.cluster.local:10334]
>     at 
> org.apache.geode.cache.client.internal.AutoConnectionSourceImpl.findServer(AutoConnectionSourceImpl.java:174)
>     at 
> org.apache.geode.cache.client.internal.ConnectionFactoryImpl.createClientToServerConnection(ConnectionFactoryImpl.java:211)
>     at 
> org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:196)
>     at 
> org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.forceCreateConnection(ConnectionManagerImpl.java:227)
>     at 
> org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.exchangeConnection(ConnectionManagerImpl.java:365)
>     at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:161)
>     at 
> org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:120)
>     at 
> org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:805)
>     at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:91)
> {noformat}
> Even the client is able to connect to a locator, we encounter a 
> NoAvailableLocatorsException exception with the message "Unable to connect to 
> any locators in the list".
> Investigating the product code we see:
>  # If there are no cache servers in the cluster, ServerLocator.pickServer() 
> will definitely construct a ClientConnectionResponse(null) which causes that 
> object’s hasResult() to respond with false in the loop termination in 
> AutoConnectionSourceImpl.queryLocators()
>  # Not only is the exception wording misleading in 
> AutoConnectionSourceImpl.findServer()—it’s also misleading in at least two 
> other calling locations in AutoConnectionSourceImpl: findReplacementServer() 
> and findServersForQueue().
>  # In each of those cases the calling method translates a null response from 
> queryLocators() into a throw of a NoAvailableLocatorsException
>  # an appropriate exception, NoAvailableServersException, already exists, for 
> the case where we were able to contact a locator but the locator was not able 
> to find any cache servers
>  # According to my Git spelunking queryLocators() has been obfuscating the 
> true cause of the failure since at least 2015
> Without analyzing ServerLocator.pickServer() 
> (LocatorLoadSnapshot.getServerForConnection()) to discern why two locators 
> might disagree on how many cache servers are in the cluster, it seems to me 
> that we should modify AutoConnectionSourceImpl.queryLocators() so that:
>  * if it gets a ServerLocationResponse with hasResult() true, it immediately 
> returns that as it does now
>  * otherwise it keeps trying and it keeps track of the last (non-null) 
> ServerLocationResponse it has received
>  * it returns the last non-null ServerLocationResponse it received (otherwise 
> it returns null)
> With that in hand, we can change the three call locations in 
> AutoConnectionSourceImpl: findServer(), findReplacementServer(), and 
> findServersForQueue() to each throw NoAvailableLocatorsException if no 
> locator responded, or NoAvailableServersException if a locator responded with 
> a ClientConnectionResponse for which hasResult() returns null.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to