Igor Barchak created GEODE-4051:
-----------------------------------
Summary: Two server jvms crashed at same time and caused some
primary and redundant buckets to be cleared. Causing some buckets to get locked
and not able to recover also after bouncing all servers
Key: GEODE-4051
URL: https://issues.apache.org/jira/browse/GEODE-4051
Project: Geode
Issue Type: Bug
Components: core
Reporter: Igor Barchak
Fix For: 1.2.0
"Pooled Waiting Message Processor 5" tid=0x162
java.lang.Thread.State: TIMED_WAITING
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.CountDownLatch$Sync@1993a5
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:64)
at
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:715)
at
org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:644)
at
org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:624)
at
org.apache.geode.distributed.internal.ReplyProcessor21.waitForReplies(ReplyProcessor21.java:519)
at
org.apache.geode.internal.cache.StateFlushOperation.flush(StateFlushOperation.java:243)
at
org.apache.geode.internal.cache.InitialImageOperation.getFromOne(InitialImageOperation.java:349)
at
org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1168)
at
org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1023)
at
org.apache.geode.internal.cache.BucketRegion.initialize(BucketRegion.java:253)
at
org.apache.geode.internal.cache.LocalRegion.createSubregion(LocalRegion.java:962)
at
org.apache.geode.internal.cache.PartitionedRegionDataStore.createBucketRegion(PartitionedRegionDataStore.java:726)
at
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucket(PartitionedRegionDataStore.java:414)
- locked org.apache.geode.internal.cache.ProxyBucketRegion@6820a0b6
at
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabFreeBucketRecursively(PartitionedRegionDataStore.java:272)
at
org.apache.geode.internal.cache.PartitionedRegionDataStore.grabBucket(PartitionedRegionDataStore.java:2815)
at
org.apache.geode.internal.cache.partitioned.ManageBackupBucketMessage.operateOnPartitionedRegion(ManageBackupBucketMessage.java:148)
at
org.apache.geode.internal.cache.partitioned.PartitionMessage.process(PartitionMessage.java:332)
Seems like it was introduced in this fix
https://github.com/apache/geode/commit/3a1062e245b3ded52ea3f6b6de0aff94ce846fa3?diff=split
See StateMarkerMessage.process
The first if condition doesn't have a finally block.
The else has a finally block.
The first if condition didn't have a 'waitFor' operation earlier - it was
introduced in this commit
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)