Bruce Schuchardt created GEODE-950:
--------------------------------------

             Summary: split brain in wanAdminLocatorsPeerHAP2P
                 Key: GEODE-950
                 URL: https://issues.apache.org/jira/browse/GEODE-950
             Project: Geode
          Issue Type: Bug
          Components: membership
            Reporter: Bruce Schuchardt


This test starts locators simultaneously and both are configured to know about 
the other.  In the run below two locators created their own membership views, 
forming a split-brain at start up time instead of forming a single distributed 
system.

Host name: w2-2013-lin-12
OS name: Linux
Architecture: amd64
OS version: 3.10.0-229.el7.x86_64
Java version: 1.8.0_66
Java vm name: Java HotSpot(TM) 64-Bit Server VM
Java vendor: Oracle Corporation
Java home: /export/gcm/where/jdk/1.8.0_66/x86_64.linux/jre

  #####################################################
  
  GemFire Version 9.0.0-SNAPSHOT
  Source Date: 2016-02-03 16:09:18 -0800
  Source Revision: 3f7070f117dbd8f2e5fb436b6aed3469e9fca673
  Source Repository: develop
  
  Build Id: bruces 020416
  Build Date: 2016-02-04 16:02:44 -0800
  Build Version: 9.0.0-SNAPSHOT bruces 020416 2016-02-04 16:02:44 -0800 javac 
1.8.0_66
  Build JDK: Java 1.8.0_66
  Build Platform: Linux 2.6.32-122.el6.x86_64 amd64
  
  #####################################################


Test was run from 
/export/frodo2/users/bruce/devel/gfasf/closed/gemfire-test/build/resources/test/newWan/discovery/newWanDiscovery.bt

Test:
parReg/newWan/parallel/discovery/wanAdminLocatorsPeerHAP2P.conf
   locatorHostsPerSite=4
   locatorThreadsPerVM=1
   locatorVMsPerHost=1
   maxOps=300
   peerHostsPerSite=2
   peerMem=256m
   peerThreadsPerVM=10
   peerVMsPerHost=2
   redundantCopies=1
   resultWaitSec=600
   wanSites=3

Run with local.conf:

hydra.HostPrms-hostNames = w2-2013-lin-12 w1-gst-dev03;

//randomSeed extracted from test:
hydra.Prms-randomSeed=1454836695339;

*** Test failed with this error:
CLIENT vm_17_thr_64_peer_2_1_w1-gst-dev03_3365
INITTASK[2] newWan.WANTest.HydraTask_initPeerTask
HANG a client exceeded max result wait sec: 600

*** Last client logging by hung thread
[info 2016/02/07 01:30:48.650 PST <vm_17_thr_64_peer_2_1_w1-gst-dev03_3365> 
tid=0x1e] Configured disk store factory: 
com.gemstone.gemfire.internal.cache.DiskStoreFactoryImpl@16cf1ca8

*** Test declared hung 595996 ms after last client logging
[severe 2016/02/07 01:40:44.646 PST <vm_17_thr_68_peer_2_1_w1-gst-dev03_2152 
Dynamic Client VM Stopper> tid=0x274] Result for 
vm_17_thr_64_peer_2_1_w1-gst-dev03_3365: INITTASK[2] 
newWan.WANTest.HydraTask_initPeerTask: HANG a client exceeded max result wait 
sec: 600

*** Hung thread
"vm_17_thr_64_peer_2_1_w1-gst-dev03_3365" #30 daemon prio=5 os_prio=0 
tid=0x00007f0ca0026000 nid=0xdd3 waiting on condition [0x00007f0cafffd000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000f7429b60> (a 
java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
        at 
com.gemstone.gemfire.internal.cache.BucketPersistenceAdvisor.waitForPrimaryPersistentRecovery(BucketPersistenceAdvisor.java:363)
        at 
com.gemstone.gemfire.internal.cache.ProxyBucketRegion.waitForPrimaryPersistentRecovery(ProxyBucketRegion.java:633)
        at 
com.gemstone.gemfire.internal.cache.PRHARedundancyProvider.recoverPersistentBuckets(PRHARedundancyProvider.java:1821)
        at 
com.gemstone.gemfire.internal.cache.PartitionedRegion.initPRInternals(PartitionedRegion.java:1073)
        - locked <0x00000000f567aa10> (a 
com.gemstone.gemfire.internal.cache.PartitionedRegion)
        at 
com.gemstone.gemfire.internal.cache.PartitionedRegion.initialize(PartitionedRegion.java:1193)
        at 
com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3171)
        at 
com.gemstone.gemfire.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:3063)
        at 
com.gemstone.gemfire.internal.cache.GemFireCacheImpl.createRegion(GemFireCacheImpl.java:3052)
        at hydra.RegionHelper.createRegion(RegionHelper.java:129)
        - locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
        at hydra.RegionHelper.createRegion(RegionHelper.java:93)
        - locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
        at hydra.RegionHelper.createRegion(RegionHelper.java:80)
        - locked <0x00000000f68035b0> (a java.lang.Class for hydra.RegionHelper)
        at newWan.WANTest.initDatastoreRegion(WANTest.java:439)
        at newWan.WANTest.HydraTask_initPeerTask(WANTest.java:797)
        - locked <0x00000000f58842e8> (a java.lang.Class for newWan.WANTest)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at hydra.MethExecutor.execute(MethExecutor.java:198)
        at hydra.MethExecutor.execute(MethExecutor.java:162)
        at hydra.TestTask.execute(TestTask.java:195)
        at hydra.RemoteTestModule$1.run(RemoteTestModule.java:216)

Stack for hung thread vm_17_thr_64_peer_2_1_w1-gst-dev03_3365 was found 3 times 
and was unchanging.

See http://hydradb.gemstone.com/hdb/testresult/920073



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to