[
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887685#comment-15887685
]
Christian Esken commented on CASSANDRA-13265:
---------------------------------------------
I see your argument. On larger clusters this may get problematic. Lets evaluate
different solutions:
- Offload expiration to a "random" regular Thread, but only a single one. If
one Thread already expires ...
--- ... let the other Threads continue
--- ... let the other Threads wait
- Go with your idea of an "Expiration Thread Pool"
Please see the attached Thread Dump to see which Threads are blocking. Mainly
they are SharedPool-Worker threads, that either do iterator.remove() or
iterator.next().
{code}
"SharedPool-Worker-294" #587 daemon prio=5 os_prio=0 tid=0x00007fb69b11e260
nid=0x6090 waiting on condition [0x00007fb162c0e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000023a426218> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at
java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225)
at
java.util.concurrent.LinkedBlockingQueue$Itr.remove(LinkedBlockingQueue.java:840)
at
org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:555)
at
org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165)
at
org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771)
at
org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744)
at
org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99)
at
org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
at java.lang.Thread.run(Thread.java:745)
{code}
{code}
"SharedPool-Worker-295" #590 daemon prio=5 os_prio=0 tid=0x00007fb69b1135b0
nid=0x608d waiting on condition [0x00007fb162cd1000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000023a426218> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at
java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225)
at
java.util.concurrent.LinkedBlockingQueue$Itr.next(LinkedBlockingQueue.java:823)
at
org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:550)
at
org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165)
at
org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771)
at
org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744)
at
org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99)
at
org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
at java.lang.Thread.run(Thread.java:745)
{code}
> Communication breakdown in OutboundTcpConnection
> ------------------------------------------------
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
> Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version
> 1.8.0_112-b15)
> Linux 3.16
> Reporter: Christian Esken
> Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz,
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to
> communicate to the other nodes. This can happen at any time, during peak load
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the
> situation and am already developing a possible fix. Here is the analysis so
> far:
> - A Threaddump in this situation showed 324 Threads in the
> OutboundTcpConnection class that want to lock the backlog queue for doing
> expiration.
> - A class histogram shows 262508 instances of
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain
> amount of queued messages, it starts thrashing itself to death. Each of the
> Thread fully locks the Queue for reading and writing by calling
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be
> starved which makes the situation even worse.
> -----
> The setup is:
> - 3-node cluster
> - replication factor 2
> - Consistency LOCAL_ONE
> - No remote DC's
> - high write throughput (100000 INSERT statements per second and more during
> peak times).
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)