[
https://issues.apache.org/jira/browse/GEODE-9075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mario Ivanac updated GEODE-9075:
--------------------------------
Description:
Geode cluster is deployed in kubernetes environment, and Istio/SideCars are
injected between cluster members. While running traffic, if any Istio/SideCar
is restarted, thread will get stuck indefinitely, while waiting for reply on
sent message.
After detail analysis, it seams that due to restarting of proxy, in some cases,
message is lost, and sending side is waiting
[warn 2021/03/25 21:04:47.282 CET server2 <ThreadsMonitor> tid=0x12] Thread
<64> (0x40) that was executed at <25 Mar 2021 21:03:53 CET> has been stuck for
<53.897 seconds> and number of thread monitor iteration <1>
Thread Name <Function Execution Processor2> state <TIMED_WAITING>
Waiting on <java.util.concurrent.CountDownLatch$Sync@7c7f9898>
Executor Group <FunctionExecutionPooledExecutor>
Monitored metric <ResourceManagerStats.numThreadsStuck>
Thread stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:736)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:811)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:784)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:874)
org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:811)
org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:699)
org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277)
org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318)
org.apache.geode.internal.cache.DistributedRegion.distributeUpdate(DistributedRegion.java:520)
...
was:
Geode cluster is deployed in kubernetes environment, and Istio/SideCars are
injected between cluster members. While running traffic, if any Istio/SideCar
is restarted, thread can get stuck indefinitely, while waiting for reply on
sent message.
[warn 2021/03/25 21:04:47.282 CET server2 <ThreadsMonitor> tid=0x12] Thread
<64> (0x40) that was executed at <25 Mar 2021 21:03:53 CET> has been stuck for
<53.897 seconds> and number of thread monitor iteration <1>
Thread Name <Function Execution Processor2> state <TIMED_WAITING>
Waiting on <java.util.concurrent.CountDownLatch$Sync@7c7f9898>
Executor Group <FunctionExecutionPooledExecutor>
Monitored metric <ResourceManagerStats.numThreadsStuck>
Thread stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:736)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:811)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:784)
org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:874)
org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:811)
org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:699)
org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277)
org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318)
org.apache.geode.internal.cache.DistributedRegion.distributeUpdate(DistributedRegion.java:520)
...
> Thread stuck indefinitely when using Istio/Sidecar
> --------------------------------------------------
>
> Key: GEODE-9075
> URL: https://issues.apache.org/jira/browse/GEODE-9075
> Project: Geode
> Issue Type: Bug
> Reporter: Mario Ivanac
> Assignee: Mario Ivanac
> Priority: Minor
> Labels: pull-request-available
>
> Geode cluster is deployed in kubernetes environment, and Istio/SideCars are
> injected between cluster members. While running traffic, if any Istio/SideCar
> is restarted, thread will get stuck indefinitely, while waiting for reply on
> sent message.
> After detail analysis, it seams that due to restarting of proxy, in some
> cases, message is lost, and sending side is waiting
>
> [warn 2021/03/25 21:04:47.282 CET server2 <ThreadsMonitor> tid=0x12] Thread
> <64> (0x40) that was executed at <25 Mar 2021 21:03:53 CET> has been stuck
> for <53.897 seconds> and number of thread monitor iteration <1>
> Thread Name <Function Execution Processor2> state <TIMED_WAITING>
> Waiting on <java.util.concurrent.CountDownLatch$Sync@7c7f9898>
> Executor Group <FunctionExecutionPooledExecutor>
> Monitored metric <ResourceManagerStats.numThreadsStuck>
> Thread stack:
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>
> org.apache.geode.internal.util.concurrent.StoppableCountDownLatch.await(StoppableCountDownLatch.java:72)
>
> org.apache.geode.distributed.internal.ReplyProcessor21.basicWait(ReplyProcessor21.java:736)
>
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:811)
>
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:784)
>
> org.apache.geode.distributed.internal.ReplyProcessor21.waitForRepliesUninterruptibly(ReplyProcessor21.java:874)
>
> org.apache.geode.internal.cache.DistributedCacheOperation.waitForAckIfNeeded(DistributedCacheOperation.java:811)
>
> org.apache.geode.internal.cache.DistributedCacheOperation._distribute(DistributedCacheOperation.java:699)
>
> org.apache.geode.internal.cache.DistributedCacheOperation.startOperation(DistributedCacheOperation.java:277)
>
> org.apache.geode.internal.cache.DistributedCacheOperation.distribute(DistributedCacheOperation.java:318)
>
> org.apache.geode.internal.cache.DistributedRegion.distributeUpdate(DistributedRegion.java:520)
> ...
--
This message was sent by Atlassian Jira
(v8.3.4#803005)