[ 
https://issues.apache.org/jira/browse/GEODE-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783991#comment-15783991
 ] 

Anthony Baker commented on GEODE-2240:
--------------------------------------

I think this fix is causing a deadlock, see the thread dump below:

{code}
vm_6_persist5_ceabe8ff-1c19-4ac0-5ed1-c77f9d8b22ee_30240:vm_6_thr_17_persist5_ceabe8ff-1c19-4ac0-5ed1-c77f9d8b22ee_30240
 ID=0x1f(31) state=TIMED_WAITING
        waiting to lock 
<java.util.concurrent.locks.ReentrantLock$NonfairSync@3be415f9>
        at sun.misc.Unsafe.park(Native Method)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
        at 
java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)
        at 
org.apache.geode.internal.util.concurrent.StoppableReentrantLock.lockInterruptibly(StoppableReentrantLock.java:88)
        at 
org.apache.geode.internal.util.concurrent.StoppableReentrantLock.lock(StoppableReentrantLock.java:71)
        at 
org.apache.geode.internal.cache.TombstoneService$TombstoneSweeper.lockQueueHead(TombstoneService.java:845)
        at 
org.apache.geode.internal.cache.TombstoneService$TombstoneSweeper.removeUnexpiredIf(TombstoneService.java:801)
        at 
org.apache.geode.internal.cache.TombstoneService$TombstoneSweeper.access$000(TombstoneService.java:718)
        at 
org.apache.geode.internal.cache.TombstoneService.gcTombstones(TombstoneService.java:221)
        locked <java.lang.Object@e63a28>
        at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:512)
        at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1307)
        at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1101)
        at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:959)
        at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:824)
        at diskRecovery.RecoveryTest.createSubregions(RecoveryTest.java:3036)
        at diskRecovery.RecoveryTest.createRegionHier(RecoveryTest.java:2990)
        at diskRecovery.RecoveryTest.HydraTask_initialize(RecoveryTest.java:245)
        locked <java.lang.Class@45829aeb>
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at hydra.MethExecutor.execute(MethExecutor.java:182)
        at hydra.MethExecutor.execute(MethExecutor.java:150)
        at hydra.TestTask.execute(TestTask.java:192)
        at hydra.RemoteTestModule$1.run(RemoteTestModule.java:212)

vm_6_persist5_ceabe8ff-1c19-4ac0-5ed1-c77f9d8b22ee_30240:Replicate/Partition 
Region Garbage Collector ID=0x54(84) state=BLOCKED
        waiting to lock <java.lang.Object@e63a28>
        at 
org.apache.geode.internal.cache.TombstoneService$ReplicateTombstoneSweeper.expireTombstone(TombstoneService.java:672)
        at 
org.apache.geode.internal.cache.TombstoneService$TombstoneSweeper.checkOldestUnexpired(TombstoneService.java:982)
        at 
org.apache.geode.internal.cache.TombstoneService$TombstoneSweeper.run(TombstoneService.java:881)
        at java.lang.Thread.run(Thread.java:745)
Locked synchronizers:
java.util.concurrent.locks.ReentrantLock$NonfairSync@3be415f9
{code}


> unexpected NullPointerException from Tombstone service
> ------------------------------------------------------
>
>                 Key: GEODE-2240
>                 URL: https://issues.apache.org/jira/browse/GEODE-2240
>             Project: Geode
>          Issue Type: Bug
>          Components: regions
>            Reporter: Darrel Schneider
>            Assignee: Darrel Schneider
>             Fix For: 1.1.0
>
>
> A test failed and the logs were found to be full of NPEs from the tombstone 
> service:[severe 2016/12/20 02:04:35.605 UTC 
> dataStoregemfire7_rs-StorageBTTest-2016-12-19-23-35-42-client-14_19508 
> <Replicate/Partition Region Garbage Collector> tid=0x44] GemFire garbage 
> collection service encountered an unexpected exception
> java.lang.NullPointerException
>         at 
> org.apache.geode.internal.cache.TombstoneService$TombstoneSweeper.lambda$purgeObsoleteTombstones$1(TombstoneService.java:938)
>         at 
> org.apache.geode.internal.cache.TombstoneService$ReplicateTombstoneSweeper.removeExpiredIf(TombstoneService.java:479)
>         at 
> org.apache.geode.internal.cache.TombstoneService$TombstoneSweeper.removeIf(TombstoneService.java:823)
>         at 
> org.apache.geode.internal.cache.TombstoneService$TombstoneSweeper.purgeObsoleteTombstones(TombstoneService.java:937)
>         at 
> org.apache.geode.internal.cache.TombstoneService$TombstoneSweeper.run(TombstoneService.java:880)
>         at java.lang.Thread.run(Thread.java:745)
> [severe 2016/12/20 02:05:45.987 UTC 
> dataStoregemfire7_rs-StorageBTTest-2016-12-19-23-35-42-client-14_19508 
> <Replicate/Partition Region Garbage Collector> tid=0x44] GemFire garbage 
> collection service encountered an unexpected exception
> java.lang.NullPointerException
>         at 
> org.apache.geode.internal.cache.TombstoneService$ReplicateTombstoneSweeper.expireBatch(TombstoneService.java:524)
>         at 
> org.apache.geode.internal.cache.TombstoneService$ReplicateTombstoneSweeper.checkExpiredTombstoneGC(TombstoneService.java:594)
>         at 
> org.apache.geode.internal.cache.TombstoneService$TombstoneSweeper.run(TombstoneService.java:878)
>         at java.lang.Thread.run(Thread.java:745)
> Both of these stacks indicate that the "expiredTombstones" ArrayList somehow 
> has nulls in it. It is an ArrayList of Tombstone instances and the only code 
> that adds to it first tests that the item it is adding is not null. The only 
> other modify operation done on it is to remove an item.
> Perhaps unsafe concurrent access is happening causing this code to see nulls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to