[
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15883047#comment-15883047
]
Christian Esken commented on CASSANDRA-13265:
---------------------------------------------
The Thread dumps show, that several Threads park on the same objects.
- 324 Threads are waiting on the same object, trying to iterate over Queue
(expiration)
- 24 Threads wait on a different object, as far as we see they try to read
from the Queue
{code}
--- cassandra.pb-cache4-dus.2017-02-20-01-41-14.td
-------------------------------------------------------------------
1 - parking to wait for <0x00000001c04b1748> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c056d4f0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c0579c60> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
24 - parking to wait for <0x00000001c058ce50> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c058e520> (a
java.util.concurrent.Semaphore$NonfairSync)
1 - parking to wait for <0x00000001c058ee50> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c0592bc0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c0593058> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c0593ae0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c05958d0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c059f788> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
4 - parking to wait for <0x00000001c07f5ea8> (a
java.util.concurrent.SynchronousQueue$TransferStack)
1 - parking to wait for <0x00000001c0df0548> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c4b52790> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c56a7ca8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c56beea8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c56bf2d8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
324 - parking to wait for <0x00000001c5d5a150> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
1 - parking to wait for <0x00000001c628edb0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c6290b78> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c62958a8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c6295b08> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c72343a8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c7581d58> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001c8dd5738> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001ccdc3b80> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001cd22e1b0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001f3c39428> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000001fb43f5d0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
1 - parking to wait for <0x00000002003b6018> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
{code}
> Communication breakdown in OutboundTcpConnection
> ------------------------------------------------
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
> Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version
> 1.8.0_112-b15)
> Linux 3.16
> Reporter: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz,
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to
> communicate to the other nodes. This can happen at any time, during peak load
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the
> situation and am already developing a possible fix. Here is the analysis so
> far:
> - A Threaddump in this situation showed 324 Threads in the
> OutboundTcpConnection class that want to lock the backlog queue for doing
> expiration.
> - A class histogram shows 262508 instances of
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain
> amount of queued messages, it starts thrashing itself to death. Each of the
> Thread fully locks the Queue for reading and writing by calling
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be
> starved which makes the situation even worse.
> -----
> The setup is:
> - 3-node cluster
> - replication factor 2
> - Consistency LOCAL_ONE
> - No remote DC's
> - high write throughput (100000 INSERT statements per second and more during
> peak times).
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)