John Blum created GEODE-7763:
--------------------------------
Summary: Apache Geode 1.11 severely and negatively impacts
performance and resource utilization
Key: GEODE-7763
URL: https://issues.apache.org/jira/browse/GEODE-7763
Project: Geode
Issue Type: Bug
Components: client/server
Reporter: John Blum
This problem was first observed in Apache Geode 1.11.0. The problem was not
present in Apache Geode 1.9.2. This problem is an issue for Apache Geode 1.10
as well!
After upgrading _Spring Session for Apache Geode_ (SSDG) 2.3 to _Spring Data
for Apache Geode_ (SDG) Neumann/2.3, which is based on Apache Geode 1.11, this
problem with SSDG's test suite started occurring.
_Spring Session for Apache Geode_ (SSDG) 2.2, which is based on _Spring Data
for Apache Geode_ (SDG) Moore/2.2, pulls in Apache Geode 1.9.2. This problem
did not occur in SSDG 2.2.
Out of curiosity, I wondered whether this problem affects (i.e. was actually
introduced in) Apache Geode 1.10.0. So, I configured SSDG 2.3 to pull in SDG
Moore/2.2 but run with Apache Geode 1.10. The problem occurred with Apache
Geode 1.10 as well!
The SSDG test class in question, affected by Geode's deficiencies, is the
[MultiThreadedHighlyConcurrentClientServerSessionOperationsIntegrationTests|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java].
The test class was modeled after a customer UC, who were using Spring Session
and Apache Geode/Pivotal GemFire as the HTTP Session state management provider,
therefore it simulates their highly concurrent environment.
The test class has 2 primary parameters: [Thread
Count|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90]
and the [Workload
Size|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L91].
The "_Workload Size_" should not be confused with the "_Payload Size_" of the
individual objects passed to the Geode data access operations (i.e. {{gets}},
{{puts}}, {{removes}}). The "_Workload Size_" merely determines the number of
{{get}}, {{put}} or {{remove}} operations performed on the (Session) Region
over the duration of the test run. Certain operations are "favored" over
others, therefore the number of {{gets}}, {{puts}} and {{removes}} is weighted.
The "_Payload_" in this case is a (HTTP) {{Session}} object and the "size" is
directly proportional to the number of Session attributes stored in the Session.
As you can see from the [test class
configuration|https://github.com/spring-projects/spring-session-data-geode/blob/2.2.2.RELEASE/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L90-L91]
in *SSDG* {{2.2}}, the *Thread Count* was set to *180* and the *Workload Size*
(or number of Region operations) was set to *10,000*.
This had to be significantly adjusted in SSDG 2.3 using Apache Geode 1.11 (and,
as it turns out, Apache Geode 1.10 as well), as can be seen in the {{2.3.0.M1}}
release bits source,
[here|https://github.com/spring-projects/spring-session-data-geode/blob/2.3.0.M1/spring-session-data-geode/src/integration-test/java/org/springframework/session/data/gemfire/MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java#L94-L95].
It turns out different combinations of the Thread Count (number of workers, or
"concurrent Sessions") and Workload Size ultimately determine whether this test
class passes or not.
In other words, if I increase the Thread Count, then the Workload Size must
decrease, otherwise the test fails! If I increase the Workload Size, then the
Thread Count must decrease, otherwise again the test fails!
I tried with different combinations of Thread Count and Workload Size until the
test passed. More often than not 180 Threads with 3000 Regions operations
worked, but was right on the cusp of failing, therefore, I settled on 180
Threads (which nearly matches the customers environment of 200 clients) and
2000 concurrent Region operations.
The point of the test class is to assert the state of the Session is consistent
at the end of the test run.
However, before this test can even finish, the client, as in the
{{ClientCache}} instance starts failing with Exceptions, specifically:
{code:java}
java.lang.RuntimeException: Session Access Task Failed
at
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:298)
at
java.util.stream.ReferencePipeline$4$1.accept(ReferencePipeline.java:210)
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.IntPipeline.reduce(IntPipeline.java:456)
at java.util.stream.IntPipeline.sum(IntPipeline.java:414)
at
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.runSessionWorkload(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:313)
at
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.concurrentSessionAccessIsCorrect(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:324)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.springframework.test.context.junit4.statements.RunBeforeTestExecutionCallbacks.evaluate(RunBeforeTestExecutionCallbacks.java:74)
at
org.springframework.test.context.junit4.statements.RunAfterTestExecutionCallbacks.evaluate(RunAfterTestExecutionCallbacks.java:84)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at
org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75)
at
org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86)
at
org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at
org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:251)
at
org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at
org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at
org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at
org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at
com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at
com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:230)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:58)
Caused by: java.util.concurrent.ExecutionException:
org.springframework.dao.DataAccessResourceFailureException: Pool unexpected
socket timed out on client connection=Pooled Connection to localhost:60964:
Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts;
nested exception is org.apache.geode.cache.client.ServerConnectivityException:
Pool unexpected socket timed out on client connection=Pooled Connection to
localhost:60964: Connection[DESTROYED]). Server unreachable: could not connect
after 1 attempts
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.safeFutureGet(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:295)
... 43 more
Caused by: org.springframework.dao.DataAccessResourceFailureException: Pool
unexpected socket timed out on client connection=Pooled Connection to
localhost:60964: Connection[DESTROYED]). Server unreachable: could not connect
after 1 attempts; nested exception is
org.apache.geode.cache.client.ServerConnectivityException: Pool unexpected
socket timed out on client connection=Pooled Connection to localhost:60964:
Connection[DESTROYED]). Server unreachable: could not connect after 1 attempts
at
org.springframework.data.gemfire.GemfireCacheUtils.convertGemfireAccessException(GemfireCacheUtils.java:235)
at
org.springframework.data.gemfire.GemfireAccessor.convertGemFireAccessException(GemfireAccessor.java:93)
at
org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:200)
at
org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.doSave(GemFireOperationsSessionRepository.java:226)
at
org.springframework.session.data.gemfire.GemFireOperationsSessionRepository.save(GemFireOperationsSessionRepository.java:186)
at
org.springframework.session.data.gemfire.AbstractGemFireIntegrationTests.save(AbstractGemFireIntegrationTests.java:409)
at
org.springframework.session.data.gemfire.MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.lambda$newAddSessionAttributeTask$2(MultiThreadedHighlyConcurrentClientServerHttpSessionAccessIntegrationTests.java:216)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.geode.cache.client.ServerConnectivityException: Pool
unexpected socket timed out on client connection=Pooled Connection to
localhost:60964: Connection[DESTROYED]). Server unreachable: could not connect
after 1 attempts
at
org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:659)
at
org.apache.geode.cache.client.internal.OpExecutorImpl.handleException(OpExecutorImpl.java:501)
at
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:153)
at
org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:108)
at
org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:772)
at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:89)
at
org.apache.geode.cache.client.internal.ServerRegionProxy.put(ServerRegionProxy.java:159)
at
org.apache.geode.internal.cache.LocalRegion.serverPut(LocalRegion.java:3035)
at
org.apache.geode.internal.cache.LocalRegion.cacheWriteBeforePut(LocalRegion.java:3152)
at
org.apache.geode.internal.cache.ProxyRegionMap.basicPut(ProxyRegionMap.java:238)
at
org.apache.geode.internal.cache.LocalRegion.virtualPut(LocalRegion.java:5580)
at
org.apache.geode.internal.cache.LocalRegionDataView.putEntry(LocalRegionDataView.java:162)
at
org.apache.geode.internal.cache.LocalRegion.basicPut(LocalRegion.java:5036)
at
org.apache.geode.internal.cache.LocalRegion.validatedPut(LocalRegion.java:1635)
at
org.apache.geode.internal.cache.LocalRegion.put(LocalRegion.java:1622)
at
org.apache.geode.internal.cache.AbstractRegion.put(AbstractRegion.java:442)
at
org.springframework.data.gemfire.GemfireTemplate.put(GemfireTemplate.java:197)
... 8 more
{code}
...
--
This message was sent by Atlassian Jira
(v8.3.4#803005)