[ 
https://issues.apache.org/jira/browse/GEODE-4650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shelley Lynn Hughes-Godfrey reopened GEODE-4650:
------------------------------------------------
      Assignee:     (was: Helena Bales)

Re-opening as this hang reproduced in CI:
   
https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/UpgradeTestOpenJDK11/builds/707

Since this was fixed in 1.8, perhaps we should expect to see it in rolling 
upgrade tests from older versions (but we need a marker for that for CI).

Hung Test:
2019-05-10 22:41:28.511 +0000  
org.apache.geode.cache.wan.WANRollingUpgradeSecondaryEventsNotReprocessedAfterCurrentSiteMemberFailoverWithOldClient
 
testSecondaryEventsNotReprocessedAfterCurrentSiteMemberFailoverWithOldClient[from_v100]

Stack dump (from callstacks):
{noformat}
"RMI TCP Connection(3)-172.17.0.4" #35 daemon prio=5 os_prio=0 cpu=5492.78ms 
elapsed=2867.65s tid=0x00007f23f8001800 nid=0x212 waiting on condition  
[0x00007f244dab5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@11.0.2/Native Method)
        - parking to wait for  <0x00000000e0804d68> (a 
java.util.concurrent.CountDownLatch$Sync)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.2/LockSupport.java:234)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.2/AbstractQueuedSynchronizer.java:1079)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.2/AbstractQueuedSynchronizer.java:1369)
        at 
java.util.concurrent.CountDownLatch.await(java.base@11.0.2/CountDownLatch.java:278)
        at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:736)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:812)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:789)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:879)
        at 
org.apache.geode.distributed.internal.locks.ElderInitProcessor.init(ElderInitProcessor.java:76)
        at 
org.apache.geode.distributed.internal.locks.ElderState.<init>(ElderState.java:57)
        at 
org.apache.geode.distributed.internal.DistributionManager.getElderStateWithTryLock(DistributionManager.java:3628)
        at 
org.apache.geode.distributed.internal.DistributionManager.getElderState(DistributionManager.java:3574)
        at 
org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.startElderCall(GrantorRequestProcessor.java:254)
        at 
org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.basicOp(GrantorRequestProcessor.java:377)
        at 
org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.basicOp(GrantorRequestProcessor.java:352)
        at 
org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.clearGrantor(GrantorRequestProcessor.java:340)
        at 
org.apache.geode.distributed.internal.locks.DLockService.clearGrantor(DLockService.java:885)
        at 
org.apache.geode.distributed.internal.locks.DLockGrantor.destroy(DLockGrantor.java:1274)
        - locked <0x00000000e0b17d48> (a 
org.apache.geode.distributed.internal.locks.DLockGrantor)
        at 
org.apache.geode.distributed.internal.locks.DLockService.nullLockGrantorId(DLockService.java:663)
        at 
org.apache.geode.distributed.internal.locks.DLockService.basicDestroy(DLockService.java:2606)
        at 
org.apache.geode.distributed.internal.locks.DLockService.destroyAndRemove(DLockService.java:2521)
        - locked <0x00000000e0b17e78> (a java.lang.Object)
        at 
org.apache.geode.distributed.internal.locks.DLockService.destroyServiceNamed(DLockService.java:2420)
        at 
org.apache.geode.distributed.DistributedLockService.destroy(DistributedLockService.java:98)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.destroyGatewaySenderLockService(GemFireCacheImpl.java:1943)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2088)
        - locked <0x00000000e0922ad8> (a java.lang.Class for 
org.apache.geode.internal.cache.GemFireCacheImpl)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1862)
        at 
org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1858)
        at 
org.apache.geode.test.dunit.cache.internal.JUnit4CacheTestCase.closeCache(JUnit4CacheTestCase.java:327)
{noformat}

Test report artifacts from this job are available at:
{noformat}
http://files.apachegeode-ci.info/builds/apache-develop-main/1.10.0-SNAPSHOT.0269/test-artifacts/1557531731/upgradetestfiles-OpenJDK11-1.10.0-SNAPSHOT.0269.tgz
{noformat}


> DLockService.clearGrantor can potentially hang
> ----------------------------------------------
>
>                 Key: GEODE-4650
>                 URL: https://issues.apache.org/jira/browse/GEODE-4650
>             Project: Geode
>          Issue Type: Bug
>          Components: distributed lock service
>            Reporter: Jason Huynh
>            Priority: Major
>              Labels: pull-request-available, swat
>             Fix For: 1.8.0
>
>         Attachments: callstacks-2018-02-10-05-25-15.txt, 
> callstacks-2018-02-10-05-25-23.txt, callstacks-2018-02-10-05-25-30.txt
>
>          Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> There was a test run in the precheckin pipeline that hung with the following 
> stack:
>  
> {code:java}
> "RMI TCP Connection(1)-172.17.0.3" #30 daemon prio=5 os_prio=0 
> tid=0x00007f4560001800 nid=0x191 waiting on condition [0x00007f45771c0000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000000e082d298> (a 
> java.util.concurrent.CountDownLatch$Sync)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:790)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:766)
> at 
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:853)
> at 
> org.apache.geode.distributed.internal.locks.ElderInitProcessor.init(ElderInitProcessor.java:72)
> at 
> org.apache.geode.distributed.internal.locks.ElderState.<init>(ElderState.java:56)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.getElderStateWithTryLock(ClusterDistributionManager.java:3359)
> at 
> org.apache.geode.distributed.internal.ClusterDistributionManager.getElderState(ClusterDistributionManager.java:3309)
> at 
> org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.startElderCall(GrantorRequestProcessor.java:238)
> at 
> org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.basicOp(GrantorRequestProcessor.java:347)
> at 
> org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.basicOp(GrantorRequestProcessor.java:327)
> at 
> org.apache.geode.distributed.internal.locks.GrantorRequestProcessor.clearGrantor(GrantorRequestProcessor.java:318)
> at 
> org.apache.geode.distributed.internal.locks.DLockService.clearGrantor(DLockService.java:872)
> at 
> org.apache.geode.distributed.internal.locks.DLockGrantor.destroy(DLockGrantor.java:1227)
> - locked <0x00000000e0837ff0> (a 
> org.apache.geode.distributed.internal.locks.DLockGrantor)
> at 
> org.apache.geode.distributed.internal.locks.DLockService.nullLockGrantorId(DLockService.java:646)
> at 
> org.apache.geode.distributed.internal.locks.DLockService.basicDestroy(DLockService.java:2358)
> at 
> org.apache.geode.distributed.internal.locks.DLockService.destroyAndRemove(DLockService.java:2276)
> - locked <0x00000000e05c7468> (a java.lang.Object)
> at 
> org.apache.geode.distributed.internal.locks.DLockService.destroyServiceNamed(DLockService.java:2214)
> at 
> org.apache.geode.distributed.DistributedLockService.destroy(DistributedLockService.java:84)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.destroyGatewaySenderLockService(GemFireCacheImpl.java:2043)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:2180)
> - locked <0x00000000e04653e0> (a java.lang.Class for 
> org.apache.geode.internal.cache.GemFireCacheImpl)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1960)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.close(GemFireCacheImpl.java:1950)
> at 
> org.apache.geode.test.junit.rules.ServerStarterRule.stopMember(ServerStarterRule.java:99)
> at 
> org.apache.geode.test.junit.rules.MemberStarterRule.after(MemberStarterRule.java:81)
> at 
> org.apache.geode.test.dunit.rules.ClusterStartupRule.stopElementInsideVM(ClusterStartupRule.java:412)
> at 
> org.apache.geode.test.junit.rules.VMProvider.lambda$stopVM$fe0d42dc$1(VMProvider.java:35)
> at 
> org.apache.geode.test.junit.rules.VMProvider$$Lambda$53/208982926.run(Unknown 
> Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at hydra.MethExecutor.executeObject(MethExecutor.java:244)
> at 
> org.apache.geode.test.dunit.standalone.RemoteDUnitVM.executeMethodOnObject(RemoteDUnitVM.java:70)
> at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
> at sun.rmi.transport.Transport$1.run(Transport.java:200)
> at sun.rmi.transport.Transport$1.run(Transport.java:197)
> at java.security.AccessController.doPrivileged(Native Method)
> at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
> at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
> at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
> at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
> at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$7/1394836008.run(Unknown
>  Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Locked ownable synchronizers:
> - <0x00000000e0332230> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> - <0x00000000e08499b0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> - <0x00000000e08520f0> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> {code}
> It looks like the cache is shutting down and we are unable to destroy the 
> lock service for the gateway sender.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to