[
https://issues.apache.org/jira/browse/GEODE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943811#comment-16943811
]
Xiaojian Zhou commented on GEODE-7258:
--------------------------------------
The root cause is:
executeOnServer() will call ServerRegionProxy.executeFunction(), where the HA
and retries are handled.
If singleHop is enabled (which is default setting), it will call
ExecuteRegionFunctionSingleHopOp.execute(retryAttempts==-1, which is default) {
retryAttempts = SingleHopClientExecutor.submitAllHA(mRetryAttempts==-1)
if retryAttempts > 0, that means singehop has finished first try. Let
ExecuteRegionFunctionOp.execute(retryAttempts -1) for all other retries.
}
Inside submitAllHA()'s exception handling, sicne retryAttemps == -1, so it will
reassign to be:
maxRetryAttempts = pool.getConnectionSource().getAllServers().size() - 1; This
code is wrong, because at that moment, the
pool.getConnectionSource().getAllServers().size() has reduced and removed the
failed server. We should not minus 1 again. For example, if we have 2 servers,
the first try using singlehop to server-1 failed, in this exception handling,
the pool.getConnectionSource().getAllServers().size() will be 1. And
maxRetryAttempts becomes 0. This is the root cause-1.
The root cause-2 is:
After singleHop failed, ExecuteRegionFunctionOp.execute() should not use
"retryAttempts -1" as parameter, it should use retryAttempts. This did not
directly trigger this bug since even it's reduced to 0, it will be executed at
least once by the do-while logic. But it will make difference when there're 3
or 4 servers.
if singleHop is disabled, it will call
ExecuteRegionFunctionOp.execute(retryAttempts), which is also used after
singlehop failed and there're still a few retries to go.
If ExecuteRegionFunctionOp.execute(-1), i.e. singleHop is disabled, when the
first try failed, during the exception handling, there's similar logic:
maxRetryAttempts = ((PoolImpl)
pool).getConnectionSource().getAllServers().size() - 1
This is wrong and it's root cause-3. As I described earlier, this will cause
the 2-server-environment will only be tried once.
> RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated
> > test[from_v1.10.0, with reindex=true] FAILED
> ------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: GEODE-7258
> URL: https://issues.apache.org/jira/browse/GEODE-7258
> Project: Geode
> Issue Type: Bug
> Components: tests
> Affects Versions: 1.10.0
> Reporter: Mark Hanson
> Assignee: Xiaojian Zhou
> Priority: Major
> Labels: GeodeCommons
> Attachments: runsWithFineLevelLogging.zip
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated
> > test[from_v1.10.0, with reindex=true] FAILED
> This test is failing repeatedly. The logs do not to my eyes indicate what is
> the source of the problem.
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK11/builds/1112]
> [https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK11/builds/1113]
>
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0169/test-results/upgradeTest/1569861659/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0169/test-artifacts/1569861659/upgradetestfiles-OpenJDK11-1.11.0-SNAPSHOT.0169.tgz
>
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-= Test Results URI
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0170/test-results/upgradeTest/1569863814/
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Test report artifacts from this job are available at:
> http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0170/test-artifacts/1569863814/upgradetestfiles-OpenJDK11-1.11.0-SNAPSHOT.0170.tgz
>
> {noformat}
> Task :geode-lucene:upgradeTest
> org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated
> > test[from_v1.10.0, with reindex=true] FAILED
> org.apache.geode.test.dunit.RMIException: While invoking
> org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated$$Lambda$172/0x00000008407b2440.run
> in VM 3 running on Host 38006782570a with 4 VMs with version 1.10.0
> at org.apache.geode.test.dunit.VM.executeMethodOnObject(VM.java:579)
> at org.apache.geode.test.dunit.VM.invoke(VM.java:406)
> at
> org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.test(RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.java:111)
> Caused by:
> java.lang.reflect.InvocationTargetException
> at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:566)
> at
> org.apache.geode.cache.lucene.LuceneSearchWithRollingUpgradeDUnit.verifyLuceneQueryResults(LuceneSearchWithRollingUpgradeDUnit.java:381)
> at
> org.apache.geode.cache.lucene.RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.lambda$test$c93719d5$2(RollingUpgradeQueryReturnsCorrectResultsAfterClientAndServersAreRolledOverAllBucketsCreated.java:111)
> Caused by:
> org.apache.geode.cache.execute.FunctionException:
> org.apache.geode.cache.client.ServerConnectivityException: Could not create a
> new connection to server 38006782570a:24478
> at
> org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.executeOnServer(ServerRegionFunctionExecutor.java:209)
> at
> org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.executeFunction(ServerRegionFunctionExecutor.java:152)
> at
> org.apache.geode.internal.cache.execute.ServerRegionFunctionExecutor.execute(ServerRegionFunctionExecutor.java:379)
> at
> org.apache.geode.cache.lucene.internal.LuceneServiceImpl.waitUntilFlushed(LuceneServiceImpl.java:658)
> ... 6 more
> Caused by:
> org.apache.geode.cache.client.ServerConnectivityException: Could not create
> a new connection to server 38006782570a:24478
> 108 tests completed, 1 failed
> > Task :geode-lucene:upgradeTest FAILED {noformat}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)