[ 
https://issues.apache.org/jira/browse/GEODE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151065#comment-15151065
 ] 

Barry Oglesby commented on GEODE-969:
-------------------------------------

Here are notes from an internal 
ParallelWANPersistenceEnabledGatewaySenderOffHeapDUnitTest run.

The ParallelWANPersistenceEnabledGatewaySenderOffHeapDUnitTest 
testPartitionedRegionWithPersistentGatewaySender test logged 4,271,656 lines 
compared with 266,590 lines in a good run.

The test stops the 7 JVMs sequentially from vm_0 to vm_7. For some reason, vm_6 
is not able to shutdown immediately. Instead it blocks for 2 seconds waiting 
for one of its resource manager thread pools to stop. In the meantime, vm_7 is 
trying a gateway sender bucket from vm_6 and failing.

Here are some relevant logs:

Here are the test method invocations:
{noformat}
[vm_0][info 2016/02/09 09:10:49.177 PST <RMI TCP 
Connection(6)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createFirstLocatorWithDSId 
with 1 args
[vm_1][info 2016/02/09 09:10:49.206 PST <RMI TCP 
Connection(6)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createFirstRemoteLocator 
with 2 args
[vm_2][info 2016/02/09 09:10:49.237 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createReceiver with 1 args
[vm_3][info 2016/02/09 09:10:49.589 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createReceiver with 1 args
[vm_4][info 2016/02/09 09:10:49.949 PST <RMI TCP 
Connection(2)-10.118.32.37> tid=0xb1] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createCache with 1 args
[vm_5][info 2016/02/09 09:10:50.288 PST <RMI TCP 
Connection(2)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createCache with 1 args
[vm_6][info 2016/02/09 09:10:50.632 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createCache with 1 args
[vm_7][info 2016/02/09 09:10:50.979 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createCache with 1 args
[vm_4][info 2016/02/09 09:10:51.332 PST <RMI TCP 
Connection(2)-10.118.32.37> tid=0xb1] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createSender with 9 args
[vm_5][info 2016/02/09 09:10:51.338 PST <RMI TCP 
Connection(2)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createSender with 9 args
[vm_6][info 2016/02/09 09:10:51.349 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createSender with 9 args
[vm_7][info 2016/02/09 09:10:51.359 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createSender with 9 args
[vm_4][info 2016/02/09 09:10:51.372 PST <RMI TCP 
Connection(2)-10.118.32.37> tid=0xb1] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createPartitionedRegion 
with 5 args
[vm_5][info 2016/02/09 09:10:51.382 PST <RMI TCP 
Connection(2)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createPartitionedRegion 
with 5 args
[vm_6][info 2016/02/09 09:10:51.404 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createPartitionedRegion 
with 5 args
[vm_7][info 2016/02/09 09:10:51.422 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createPartitionedRegion 
with 5 args
[vm_4][info 2016/02/09 09:10:51.450 PST <RMI TCP 
Connection(2)-10.118.32.37> tid=0xb1] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.startSender with 1 args
[vm_5][info 2016/02/09 09:10:51.499 PST <RMI TCP 
Connection(2)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.startSender with 1 args
[vm_6][info 2016/02/09 09:10:51.555 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.startSender with 1 args
[vm_7][info 2016/02/09 09:10:51.624 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.startSender with 1 args
[vm_2][info 2016/02/09 09:10:51.704 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createPartitionedRegion 
with 5 args
[vm_3][info 2016/02/09 09:10:51.711 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.createPartitionedRegion 
with 5 args
[vm_4][info 2016/02/09 09:10:51.725 PST <RMI TCP 
Connection(2)-10.118.32.37> tid=0xb1] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.doPuts with 2 args
[vm_2][info 2016/02/09 09:10:55.520 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.validateRegionSize with 2 
args
[vm_3][info 2016/02/09 09:10:56.832 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.validateRegionSize with 2 
args
[vm_0][info 2016/02/09 09:10:56.846 PST <RMI TCP 
Connection(6)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args
[vm_1][info 2016/02/09 09:10:56.967 PST <RMI TCP 
Connection(6)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args
[vm_2][info 2016/02/09 09:10:57.081 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args
[vm_3][info 2016/02/09 09:10:57.233 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args
[vm_4][info 2016/02/09 09:10:57.364 PST <RMI TCP 
Connection(2)-10.118.32.37> tid=0xb1] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args
[vm_5][info 2016/02/09 09:10:58.528 PST <RMI TCP 
Connection(2)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args
[vm_6][info 2016/02/09 09:11:13.661 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args
[vm_7][info 2016/02/09 09:13:13.921 PST <RMI TCP 
Connection(7)-10.118.32.37> tid=0x342] Received method: 
com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args
{noformat}
In between vm_6 and vm_7 being shutdown, vm_7 is continuously trying to GII a 
gateway sender bucket from vm_6.

It logs these messages for each bucket every few milliseconds from 09:11:13.670 
to 09:13:13.701:
{noformat}
[vm_7][info 2016/02/09 09:11:13.670 PST <Pooled Waiting Message Processor 
12> tid=0x31d] Region _B__ln__PARALLEL__GATEWAY__SENDER__QUEUE_99 requesting 
initial image from japan(7243)<ec><v3>:1031
[vm_7][info 2016/02/09 09:11:13.670 PST <Pooled Waiting Message Processor 
12> tid=0x31d] _B__ln__PARALLEL__GATEWAY__SENDER__QUEUE_99 failed to get 
image from japan(7243)<ec><v3>:1031
[vm_7][info 2016/02/09 09:11:13.670 PST <Pooled Waiting Message Processor 
12> tid=0x31d] Region _B__ln__PARALLEL__GATEWAY__SENDER__QUEUE_99 
initialized persistent id: 
japan.gemstone.com/10.118.32.37:/japan1/users/build/jenkins/blds/workspace/Geode_develop_DistributedTests/gemfire-wan/build/distributedTest/dunit/vm7/ln_disk_1455037851360_7
 created at timestamp 1455037851625 version 0 diskStoreId 
919f2c2d3620438d-bb67ce053723de51 name null with data from 
japan(7243)<ec><v3>:1031.
[vm_7][info 2016/02/09 09:11:13.671 PST <Pooled Waiting Message Processor 
12> tid=0x31d] GII failed from all sources, but members are still online. 
Retrying the GII.
{noformat}
It finally stops, when vm_6 stops waiting for resource manager threads and 
shuts down:
{noformat}
[vm_6][warn 2016/02/09 09:13:13.669 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Failed to stop resource manager 
threads in 120 seconds
[vm_6][info 2016/02/09 09:13:13.816 PST <RMI TCP 
Connection(5)-10.118.32.37> tid=0x12] Shutting down DistributionManager 
japan(7243)<ec><v3>:1031. 
{noformat}

> Various WAN dunit tests sometimes cause OOME during combineReports task
> -----------------------------------------------------------------------
>
>                 Key: GEODE-969
>                 URL: https://issues.apache.org/jira/browse/GEODE-969
>             Project: Geode
>          Issue Type: Bug
>          Components: wan
>            Reporter: Barry Oglesby
>
> The combineReports task sometimes throws a 'java.lang.OutOfMemoryError: Java 
> heap space' error when creating the JUnit test report like:
> {noformat}
> :combineReports
> All test reports at 
> /japan1/users/build/jenkins/blds/workspace/GemFire_develop_closed_CC/gemfire/closed/build/reports/combined
> FAILURE: Build failed with an exception.
> Where:
> Build file 
> '/japan1/users/build/jenkins/blds/workspace/GemFire_develop_closed_CC/gemfire/closed/pivotalgf-assembly/build.gradle'
>  line: 156
> What went wrong:
> Execution failed for task ':pivotalgf-assembly:legacyDunit'.
> > There were failing tests.
> Try:
> Run with --stacktrace option to get the stack trace. Run with --info or 
> --debug option to get more log output.
> BUILD FAILED
> Total time: 2 hrs 8 mins 51.811 secs
> Recording test results
> ERROR: Publisher 'Publish JUnit test result report' aborted due to exception:
> java.io.IOException: remote file operation failed: 
> /japan1/users/build/jenkins/blds/workspace/GemFire_develop_closed_CC at 
> hudson.remoting.Channel@a3f9b9e:Japan: java.io.IOException: Remote call on 
> Japan failed
> at hudson.FilePath.act(FilePath.java:987)
> at hudson.FilePath.act(FilePath.java:969)
> at hudson.tasks.junit.JUnitParser.parseResult(JUnitParser.java:90)
> at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:120)
> at 
> hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:137)
> at 
> hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:75)
> at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> at 
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779)
> at 
> hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:726)
> at hudson.model.Build$BuildExecution.post2(Build.java:185)
> at 
> hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:671)
> at hudson.model.Run.execute(Run.java:1766)
> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> at hudson.model.ResourceController.execute(ResourceController.java:98)
> at hudson.model.Executor.run(Executor.java:408)
> Caused by: java.io.IOException: Remote call on Japan failed
> at hudson.remoting.Channel.call(Channel.java:786)
> at hudson.FilePath.act(FilePath.java:980)
> ... 14 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at 
> com.sun.org.apache.xerces.internal.util.XMLStringBuffer.append(XMLStringBuffer.java:208)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanData(XMLEntityScanner.java:1378)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(XMLDocumentFragmentScannerImpl.java:1654)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3020)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
> at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
> at 
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
> at 
> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
> at 
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
> at 
> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649)
> at org.dom4j.io.SAXReader.read(SAXReader.java:465)
> at org.dom4j.io.SAXReader.read(SAXReader.java:264)
> at hudson.tasks.junit.SuiteResult.parse(SuiteResult.java:123)
> at hudson.tasks.junit.TestResult.parse(TestResult.java:282)
> at hudson.tasks.junit.TestResult.parsePossiblyEmpty(TestResult.java:228)
> at hudson.tasks.junit.TestResult.parse(TestResult.java:163)
> at hudson.tasks.junit.TestResult.parse(TestResult.java:146)
> at hudson.tasks.junit.TestResult.(TestResult.java:122)
> at 
> hudson.tasks.junit.JUnitParser$ParseResultCallable.invoke(JUnitParser.java:119)
> at 
> hudson.tasks.junit.JUnitParser$ParseResultCallable.invoke(JUnitParser.java:93)
> at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2691)
> at hudson.remoting.UserRequest.perform(UserRequest.java:121)
> at hudson.remoting.UserRequest.perform(UserRequest.java:49)
> at hudson.remoting.Request$2.run(Request.java:326)
> at 
> hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> at ......remote call to Japan(Native Method)
> at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1413)
> at hudson.remoting.UserResponse.retrieve(UserRequest.java:221)
> at hudson.remoting.Channel.call(Channel.java:778)
> ... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to