[ 
https://issues.apache.org/jira/browse/GEODE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943811#comment-16943811
 ] 

Xiaojian Zhou commented on GEODE-7258:
--------------------------------------

The root cause is:
executeOnServer() will call ServerRegionProxy.executeFunction(), where the HA 
and retries are handled. 

If singleHop is enabled (which is default setting), it will call 
ExecuteRegionFunctionSingleHopOp.execute(retryAttempts==-1, which is default) {
     retryAttempts = SingleHopClientExecutor.submitAllHA(mRetryAttempts==-1) 
    if retryAttempts > 0, that means singehop has finished first try. Let 
    ExecuteRegionFunctionOp.execute(retryAttempts -1) for all other retries.
} 

Inside submitAllHA()'s exception handling, sicne retryAttemps == -1, so it will 
reassign to be:
maxRetryAttempts = pool.getConnectionSource().getAllServers().size() - 1; This 
code is wrong, because at that moment, the 
pool.getConnectionSource().getAllServers().size() has reduced and removed the 
failed server. We should not minus 1 again. For example, if we have 2 servers, 
the first try using singlehop to server-1 failed, in this exception handling, 
the pool.getConnectionSource().getAllServers().size() will be 1. And 
maxRetryAttempts becomes 0. This is the root cause-1. 

The root cause-2 is: 
After singleHop failed, ExecuteRegionFunctionOp.execute() should not use 
"retryAttempts -1" as parameter, it should use retryAttempts. This did not 
directly trigger this bug since even it's reduced to 0, it will be executed at 
least once by the do-while logic. But it will make difference when there're 3 
or 4 servers. 

if singleHop is disabled, it will call 
ExecuteRegionFunctionOp.execute(retryAttempts), which is also used after 
singlehop failed and there're still a few retries to go. 

If ExecuteRegionFunctionOp.execute(-1), i.e. singleHop is disabled, when the 
first try failed, during the exception handling, there's similar logic: 
maxRetryAttempts = ((PoolImpl) 
pool).getConnectionSource().getAllServers().size() - 1

This is wrong and it's root cause-3. As I described earlier, this will cause 
the 2-server-environment will only be tried once. 

> RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated
>  > test[from_v1.10.0, with reindex=true] FAILED
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-7258
>                 URL: https://issues.apache.org/jira/browse/GEODE-7258
>             Project: Geode
>          Issue Type: Bug
>          Components: tests
>    Affects Versions: 1.10.0
>            Reporter: Mark Hanson
>            Assignee: Xiaojian Zhou
>            Priority: Major
>              Labels: GeodeCommons
>         Attachments: runsWithFineLevelLogging.zip
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated
>  > test[from_v1.10.0, with reindex=true] FAILED
> This test is failing repeatedly. The logs do not to my eyes indicate what is 
> the source of the problem.
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK11/builds/1112]
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK11/builds/1113]
>  
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0169/test-results/upgradeTest/1569861659/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0169/test-artifacts/1569861659/upgradetestfiles-OpenJDK11-1.11.0-SNAPSHOT.0169.tgz
>  
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0170/test-results/upgradeTest/1569863814/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0170/test-artifacts/1569863814/upgradetestfiles-OpenJDK11-1.11.0-SNAPSHOT.0170.tgz
>  
> {noformat}
> Task :geode-lucene:upgradeTest
> org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated
>  > test[from_v1.10.0, with reindex=true] FAILED
>  org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated$$Lambda$172/0x00000008407b2440.run
>  in VM 3 running on Host 38006782570a with 4 VMs with version 1.10.0
>  at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
>  at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
>  at 
> org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.test(RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.java:111)
> Caused by:
>  java.lang.reflect.InvocationTargetException
>  at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:566)
>  at 
> org.apache.geode.cache.lucene.LuceneSearchWithRollingUpgradeDUnit.verifyLuceneQueryResults(LuceneSearchWithRollingUpgradeDUnit.java:381)
>  at 
> org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.lambda$test$c93719d5$2(RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.java:111)
> Caused by:
>  org.apache.geode.cache.execute.FunctionException: 
> org.apache.geode.cache.client.ServerConnectivityException: Could not create a 
> new connection to server 38006782570a:24478
>  at 
> org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.executeOnServer(ServerRegionFunctionExecutor.java:209)
>  at 
> org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.executeFunction(ServerRegionFunctionExecutor.java:152)
>  at 
> org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.execute(ServerRegionFunctionExecutor.java:379)
>  at 
> org.apache.geode.cache.lucene.internal.LuceneServiceImpl.waitUntilFlushed(LuceneServiceImpl.java:658)
>  ... 6 more
> Caused by:
>  org.apache.geode.cache.client.ServerConnectivityException: Could not create 
> a new connection to server 38006782570a:24478
> 108 tests completed, 1 failed
> > Task :geode-lucene:upgradeTest FAILED {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to