[ https://issues.apache.org/jira/browse/GEODE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151065#comment-15151065 ]
Barry Oglesby commented on GEODE-969: ------------------------------------- Here are notes from an internal ParallelWANPersistenceEnabledGatewaySenderOffHeapDUnitTest run. The ParallelWANPersistenceEnabledGatewaySenderOffHeapDUnitTest testPartitionedRegionWithPersistentGatewaySender test logged 4,271,656 lines compared with 266,590 lines in a good run. The test stops the 7 JVMs sequentially from vm_0 to vm_7. For some reason, vm_6 is not able to shutdown immediately. Instead it blocks for 2 seconds waiting for one of its resource manager thread pools to stop. In the meantime, vm_7 is trying a gateway sender bucket from vm_6 and failing. Here are some relevant logs: Here are the test method invocations: {noformat} [vm_0][info 2016/02/09 09:10:49.177 PST <RMI TCP Connection(6)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createFirstLocatorWithDSId with 1 args [vm_1][info 2016/02/09 09:10:49.206 PST <RMI TCP Connection(6)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createFirstRemoteLocator with 2 args [vm_2][info 2016/02/09 09:10:49.237 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createReceiver with 1 args [vm_3][info 2016/02/09 09:10:49.589 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createReceiver with 1 args [vm_4][info 2016/02/09 09:10:49.949 PST <RMI TCP Connection(2)-10.118.32.37> tid=0xb1] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createCache with 1 args [vm_5][info 2016/02/09 09:10:50.288 PST <RMI TCP Connection(2)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createCache with 1 args [vm_6][info 2016/02/09 09:10:50.632 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createCache with 1 args [vm_7][info 2016/02/09 09:10:50.979 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createCache with 1 args [vm_4][info 2016/02/09 09:10:51.332 PST <RMI TCP Connection(2)-10.118.32.37> tid=0xb1] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createSender with 9 args [vm_5][info 2016/02/09 09:10:51.338 PST <RMI TCP Connection(2)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createSender with 9 args [vm_6][info 2016/02/09 09:10:51.349 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createSender with 9 args [vm_7][info 2016/02/09 09:10:51.359 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createSender with 9 args [vm_4][info 2016/02/09 09:10:51.372 PST <RMI TCP Connection(2)-10.118.32.37> tid=0xb1] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createPartitionedRegion with 5 args [vm_5][info 2016/02/09 09:10:51.382 PST <RMI TCP Connection(2)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createPartitionedRegion with 5 args [vm_6][info 2016/02/09 09:10:51.404 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createPartitionedRegion with 5 args [vm_7][info 2016/02/09 09:10:51.422 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createPartitionedRegion with 5 args [vm_4][info 2016/02/09 09:10:51.450 PST <RMI TCP Connection(2)-10.118.32.37> tid=0xb1] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.startSender with 1 args [vm_5][info 2016/02/09 09:10:51.499 PST <RMI TCP Connection(2)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.startSender with 1 args [vm_6][info 2016/02/09 09:10:51.555 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.startSender with 1 args [vm_7][info 2016/02/09 09:10:51.624 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.startSender with 1 args [vm_2][info 2016/02/09 09:10:51.704 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createPartitionedRegion with 5 args [vm_3][info 2016/02/09 09:10:51.711 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.createPartitionedRegion with 5 args [vm_4][info 2016/02/09 09:10:51.725 PST <RMI TCP Connection(2)-10.118.32.37> tid=0xb1] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.doPuts with 2 args [vm_2][info 2016/02/09 09:10:55.520 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.validateRegionSize with 2 args [vm_3][info 2016/02/09 09:10:56.832 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.validateRegionSize with 2 args [vm_0][info 2016/02/09 09:10:56.846 PST <RMI TCP Connection(6)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args [vm_1][info 2016/02/09 09:10:56.967 PST <RMI TCP Connection(6)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args [vm_2][info 2016/02/09 09:10:57.081 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args [vm_3][info 2016/02/09 09:10:57.233 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args [vm_4][info 2016/02/09 09:10:57.364 PST <RMI TCP Connection(2)-10.118.32.37> tid=0xb1] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args [vm_5][info 2016/02/09 09:10:58.528 PST <RMI TCP Connection(2)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args [vm_6][info 2016/02/09 09:11:13.661 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args [vm_7][info 2016/02/09 09:13:13.921 PST <RMI TCP Connection(7)-10.118.32.37> tid=0x342] Received method: com.gemstone.gemfire.internal.cache.wan.WANTestBase.cleanupVM with 0 args {noformat} In between vm_6 and vm_7 being shutdown, vm_7 is continuously trying to GII a gateway sender bucket from vm_6. It logs these messages for each bucket every few milliseconds from 09:11:13.670 to 09:13:13.701: {noformat} [vm_7][info 2016/02/09 09:11:13.670 PST <Pooled Waiting Message Processor 12> tid=0x31d] Region _B__ln__PARALLEL__GATEWAY__SENDER__QUEUE_99 requesting initial image from japan(7243)<ec><v3>:1031 [vm_7][info 2016/02/09 09:11:13.670 PST <Pooled Waiting Message Processor 12> tid=0x31d] _B__ln__PARALLEL__GATEWAY__SENDER__QUEUE_99 failed to get image from japan(7243)<ec><v3>:1031 [vm_7][info 2016/02/09 09:11:13.670 PST <Pooled Waiting Message Processor 12> tid=0x31d] Region _B__ln__PARALLEL__GATEWAY__SENDER__QUEUE_99 initialized persistent id: japan.gemstone.com/10.118.32.37:/japan1/users/build/jenkins/blds/workspace/Geode_develop_DistributedTests/gemfire-wan/build/distributedTest/dunit/vm7/ln_disk_1455037851360_7 created at timestamp 1455037851625 version 0 diskStoreId 919f2c2d3620438d-bb67ce053723de51 name null with data from japan(7243)<ec><v3>:1031. [vm_7][info 2016/02/09 09:11:13.671 PST <Pooled Waiting Message Processor 12> tid=0x31d] GII failed from all sources, but members are still online. Retrying the GII. {noformat} It finally stops, when vm_6 stops waiting for resource manager threads and shuts down: {noformat} [vm_6][warn 2016/02/09 09:13:13.669 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Failed to stop resource manager threads in 120 seconds [vm_6][info 2016/02/09 09:13:13.816 PST <RMI TCP Connection(5)-10.118.32.37> tid=0x12] Shutting down DistributionManager japan(7243)<ec><v3>:1031. {noformat} > Various WAN dunit tests sometimes cause OOME during combineReports task > ----------------------------------------------------------------------- > > Key: GEODE-969 > URL: https://issues.apache.org/jira/browse/GEODE-969 > Project: Geode > Issue Type: Bug > Components: wan > Reporter: Barry Oglesby > > The combineReports task sometimes throws a 'java.lang.OutOfMemoryError: Java > heap space' error when creating the JUnit test report like: > {noformat} > :combineReports > All test reports at > /japan1/users/build/jenkins/blds/workspace/GemFire_develop_closed_CC/gemfire/closed/build/reports/combined > FAILURE: Build failed with an exception. > Where: > Build file > '/japan1/users/build/jenkins/blds/workspace/GemFire_develop_closed_CC/gemfire/closed/pivotalgf-assembly/build.gradle' > line: 156 > What went wrong: > Execution failed for task ':pivotalgf-assembly:legacyDunit'. > > There were failing tests. > Try: > Run with --stacktrace option to get the stack trace. Run with --info or > --debug option to get more log output. > BUILD FAILED > Total time: 2 hrs 8 mins 51.811 secs > Recording test results > ERROR: Publisher 'Publish JUnit test result report' aborted due to exception: > java.io.IOException: remote file operation failed: > /japan1/users/build/jenkins/blds/workspace/GemFire_develop_closed_CC at > hudson.remoting.Channel@a3f9b9e:Japan: java.io.IOException: Remote call on > Japan failed > at hudson.FilePath.act(FilePath.java:987) > at hudson.FilePath.act(FilePath.java:969) > at hudson.tasks.junit.JUnitParser.parseResult(JUnitParser.java:90) > at hudson.tasks.junit.JUnitResultArchiver.parse(JUnitResultArchiver.java:120) > at > hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:137) > at > hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:75) > at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20) > at > hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:779) > at > hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:726) > at hudson.model.Build$BuildExecution.post2(Build.java:185) > at > hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:671) > at hudson.model.Run.execute(Run.java:1766) > at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) > at hudson.model.ResourceController.execute(ResourceController.java:98) > at hudson.model.Executor.run(Executor.java:408) > Caused by: java.io.IOException: Remote call on Japan failed > at hudson.remoting.Channel.call(Channel.java:786) > at hudson.FilePath.act(FilePath.java:980) > ... 14 more > Caused by: java.lang.OutOfMemoryError: Java heap space > at > com.sun.org.apache.xerces.internal.util.XMLStringBuffer.append(XMLStringBuffer.java:208) > at > com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanData(XMLEntityScanner.java:1378) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(XMLDocumentFragmentScannerImpl.java:1654) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3020) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) > at > com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) > at > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649) > at org.dom4j.io.SAXReader.read(SAXReader.java:465) > at org.dom4j.io.SAXReader.read(SAXReader.java:264) > at hudson.tasks.junit.SuiteResult.parse(SuiteResult.java:123) > at hudson.tasks.junit.TestResult.parse(TestResult.java:282) > at hudson.tasks.junit.TestResult.parsePossiblyEmpty(TestResult.java:228) > at hudson.tasks.junit.TestResult.parse(TestResult.java:163) > at hudson.tasks.junit.TestResult.parse(TestResult.java:146) > at hudson.tasks.junit.TestResult.(TestResult.java:122) > at > hudson.tasks.junit.JUnitParser$ParseResultCallable.invoke(JUnitParser.java:119) > at > hudson.tasks.junit.JUnitParser$ParseResultCallable.invoke(JUnitParser.java:93) > at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2691) > at hudson.remoting.UserRequest.perform(UserRequest.java:121) > at hudson.remoting.UserRequest.perform(UserRequest.java:49) > at hudson.remoting.Request$2.run(Request.java:326) > at > hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > at ......remote call to Japan(Native Method) > at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1413) > at hudson.remoting.UserResponse.retrieve(UserRequest.java:221) > at hudson.remoting.Channel.call(Channel.java:778) > ... 15 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)