[
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887685#comment-15887685
]
Christian Esken edited comment on CASSANDRA-13265 at 2/28/17 10:10 AM:
-----------------------------------------------------------------------
I see your argument. On larger clusters this may get problematic. I will try to
summarize the alternative solutions:
- Offload expiration to a "random" regular Thread, but only a single one. If
one Thread already expires ...
-- ... let the other Threads continue (1)
-- ... let the other Threads wait (2)
- Use an "Expiration Thread Pool" (3). I am not (currently) in favor for it,
and if I understood you correctly then it is also not your preference.
I will implement option (1) today.
Please see the attached Thread Dump to see which Threads are blocking. Here are
two examples from the Thread Dumps. Mainly they are SharedPool-Worker threads,
that either do iterator.remove() or iterator.next(). I think in the Threaddump
there is also a HintDispatcher Thread that is parking on the same lock.
java.util.concurrent.LinkedBlockingQueue$Itr.remove:
{code}
"SharedPool-Worker-294" #587 daemon prio=5 os_prio=0 tid=0x00007fb69b11e260
nid=0x6090 waiting on condition [0x00007fb162c0e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000023a426218> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at
java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225)
at
java.util.concurrent.LinkedBlockingQueue$Itr.remove(LinkedBlockingQueue.java:840)
at
org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:555)
at
org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165)
at
org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771)
at
org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744)
at
org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99)
at
org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
at java.lang.Thread.run(Thread.java:745)
{code}
java.util.concurrent.LinkedBlockingQueue$Itr.next:
{code}
"SharedPool-Worker-295" #590 daemon prio=5 os_prio=0 tid=0x00007fb69b1135b0
nid=0x608d waiting on condition [0x00007fb162cd1000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000023a426218> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at
java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225)
at
java.util.concurrent.LinkedBlockingQueue$Itr.next(LinkedBlockingQueue.java:823)
at
org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:550)
at
org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165)
at
org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771)
at
org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744)
at
org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99)
at
org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
at java.lang.Thread.run(Thread.java:745)
{code}
was (Author: cesken):
I see your argument. On larger clusters this may get problematic. Lets evaluate
different solutions:
- Offload expiration to a "random" regular Thread, but only a single one. If
one Thread already expires ...
--- ... let the other Threads continue
--- ... let the other Threads wait
- Go with your idea of an "Expiration Thread Pool"
Please see the attached Thread Dump to see which Threads are blocking. Here are
two examples from the Thread Dumps. Mainly they are SharedPool-Worker threads,
that either do iterator.remove() or iterator.next(). I think in the Threaddump
there is also a HintDispatcher Thread that is parking on the same lock.
java.util.concurrent.LinkedBlockingQueue$Itr.remove:
{code}
"SharedPool-Worker-294" #587 daemon prio=5 os_prio=0 tid=0x00007fb69b11e260
nid=0x6090 waiting on condition [0x00007fb162c0e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000023a426218> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at
java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225)
at
java.util.concurrent.LinkedBlockingQueue$Itr.remove(LinkedBlockingQueue.java:840)
at
org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:555)
at
org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165)
at
org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771)
at
org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744)
at
org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99)
at
org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
at java.lang.Thread.run(Thread.java:745)
{code}
java.util.concurrent.LinkedBlockingQueue$Itr.next:
{code}
"SharedPool-Worker-295" #590 daemon prio=5 os_prio=0 tid=0x00007fb69b1135b0
nid=0x608d waiting on condition [0x00007fb162cd1000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000023a426218> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at
java.util.concurrent.LinkedBlockingQueue.fullyLock(LinkedBlockingQueue.java:225)
at
java.util.concurrent.LinkedBlockingQueue$Itr.next(LinkedBlockingQueue.java:823)
at
org.apache.cassandra.net.OutboundTcpConnection.expireMessages(OutboundTcpConnection.java:550)
at
org.apache.cassandra.net.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:165)
at
org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:771)
at
org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:744)
at
org.apache.cassandra.hints.HintVerbHandler.reply(HintVerbHandler.java:99)
at
org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:94)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
at
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
at java.lang.Thread.run(Thread.java:745)
{code}
> Communication breakdown in OutboundTcpConnection
> ------------------------------------------------
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
> Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version
> 1.8.0_112-b15)
> Linux 3.16
> Reporter: Christian Esken
> Assignee: Christian Esken
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz,
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to
> communicate to the other nodes. This can happen at any time, during peak load
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the
> situation and am already developing a possible fix. Here is the analysis so
> far:
> - A Threaddump in this situation showed 324 Threads in the
> OutboundTcpConnection class that want to lock the backlog queue for doing
> expiration.
> - A class histogram shows 262508 instances of
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain
> amount of queued messages, it starts thrashing itself to death. Each of the
> Thread fully locks the Queue for reading and writing by calling
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be
> starved which makes the situation even worse.
> -----
> The setup is:
> - 3-node cluster
> - replication factor 2
> - Consistency LOCAL_ONE
> - No remote DC's
> - high write throughput (100000 INSERT statements per second and more during
> peak times).
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)