[jira] [Created] (GEODE-4051) Two server jvms crashed at same time and caused some primary and redundant buckets to be cleared. Causing some buckets to get locked and not able to recover also after bouncing all servers

Igor Barchak (JIRA) Tue, 05 Dec 2017 08:37:23 -0800

Igor Barchak created GEODE-4051:
-----------------------------------

             Summary: Two server jvms crashed at same time and caused some 
primary and redundant buckets to be cleared. Causing some buckets to get locked 
and not able to recover also after bouncing all servers
                 Key: GEODE-4051
                 URL: https://issues.apache.org/jira/browse/GEODE-4051
             Project: Geode
          Issue Type: Bug
          Components: core
            Reporter: Igor Barchak
             Fix For: 1.2.0



"Pooled Waiting Message Processor 5" tid=0x162
    java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        -  waiting on java.util.concurrent.CountDownLatch$Sync@1993a5
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
        at 
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:644)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:624)
        at 
org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:519)
        at 
org.apache.geode.internal.cache.StateFlushOperation.flush(StateFlushOperation.java:243)
        at 
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:349)
        at 
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1168)
        at 
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1023)
        at 
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:253)
        at 
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:962)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:726)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:414)
        -  locked org.apache.geode.internal.cache.ProxyBucketRegion@6820a0b6
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:272)
        at 
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2815)
        at 
org.apache.geode.internal.cache.partitioned.ManageBackupBucketMessage.operateOnPartitionedRegion(ManageBackupBucketMessage.java:148)
        at 
org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)





Seems like it was introduced in this fix

https://github.com/apache/geode/commit/3a1062e245b3ded52ea3f6b6de0aff94ce846fa3?diff=split

See StateMarkerMessage.process

The first if condition doesn't have a finally block.
The else has a finally block.

The first if condition didn't have a 'waitFor' operation earlier - it was 
introduced in this commit




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (GEODE-4051) Two server jvms crashed at same time and caused some primary and redundant buckets to be cleared. Causing some buckets to get locked and not able to recover also after bouncing all servers

Reply via email to