[
https://issues.apache.org/jira/browse/GEODE-9075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mario Ivanac updated GEODE-9075:
--------------------------------
Description:
Geode cluster is deployed in kubernetes environment, and Istio/SideCars are
injected between cluster members. While running traffic, if any Istio/SideCar
is restarted, thread can get stuck indefinitely, while waiting for reply on
sent message.
[warn 2021/03/25 21:04:47.282 CET server2 <ThreadsMonitor> tid=0x12] Thread
<64> (0x40) that was executed at <25 Mar 2021 21:03:53 CET> has been stuck for
<53.897 seconds> and number of thread monitor iteration <1>
Thread Name <Function Execution Processor2> state <TIMED_WAITING>
Waiting on <java.util.concurrent.CountDownLatch$Sync@7c7f9898>
Executor Group <FunctionExecutionPooledExecutor>
Monitored metric <ResourceManagerStats.numThreadsStuck>
Thread stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:736)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:811)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:784)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:874)
org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:811)
org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:699)
org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277)
org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318)
org.apache.geode.internal.cache.DistributedRegion.distributeUpdate(DistributedRegion.java:520)
...
was:
Geode cluster is deployed in kubernetes environment, and Istio/SideCars are
injected between cluster members. While running traffic, if any Istio/SideCar
is restarted, thread can get stuck, while waiting for reply on sent message.
[warn 2021/03/25 21:04:47.282 CET server2 <ThreadsMonitor> tid=0x12] Thread
<64> (0x40) that was executed at <25 Mar 2021 21:03:53 CET> has been stuck for
<53.897 seconds> and number of thread monitor iteration <1>
Thread Name <Function Execution Processor2> state <TIMED_WAITING>
Waiting on <java.util.concurrent.CountDownLatch$Sync@7c7f9898>
Executor Group <FunctionExecutionPooledExecutor>
Monitored metric <ResourceManagerStats.numThreadsStuck>
Thread stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:736)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:811)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:784)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:874)
org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:811)
org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:699)
org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277)
org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318)
org.apache.geode.internal.cache.DistributedRegion.distributeUpdate(DistributedRegion.java:520)
...
> Thread stuck indefinitely when using Istio/Sidecar
> --------------------------------------------------
>
> Key: GEODE-9075
> URL: https://issues.apache.org/jira/browse/GEODE-9075
> Project: Geode
> Issue Type: Bug
> Reporter: Mario Ivanac
> Assignee: Mario Ivanac
> Priority: Major
>
> Geode cluster is deployed in kubernetes environment, and Istio/SideCars are
> injected between cluster members. While running traffic, if any Istio/SideCar
> is restarted, thread can get stuck indefinitely, while waiting for reply on
> sent message.
>
> [warn 2021/03/25 21:04:47.282 CET server2 <ThreadsMonitor> tid=0x12] Thread
> <64> (0x40) that was executed at <25 Mar 2021 21:03:53 CET> has been stuck
> for <53.897 seconds> and number of thread monitor iteration <1>
> Thread Name <Function Execution Processor2> state <TIMED_WAITING>
> Waiting on <java.util.concurrent.CountDownLatch$Sync@7c7f9898>
> Executor Group <FunctionExecutionPooledExecutor>
> Monitored metric <ResourceManagerStats.numThreadsStuck>
> Thread stack:
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
>
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:736)
>
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:811)
>
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:784)
>
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:874)
>
> org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:811)
>
> org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:699)
>
> org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277)
>
> org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318)
>
> org.apache.geode.internal.cache.DistributedRegion.distributeUpdate(DistributedRegion.java:520)
> ...
--
This message was sent by Atlassian Jira
(v8.3.4#803005)