[
https://issues.apache.org/jira/browse/CASSANDRA-13039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Corentin Chary updated CASSANDRA-13039:
---------------------------------------
Summary: Mutation time mostly spent in LinkedBlockingQueue.put() when
writing with ONE (was: Mutation time mostly spent in LinkedBlockingQueue.put())
> Mutation time mostly spent in LinkedBlockingQueue.put() when writing with ONE
> -----------------------------------------------------------------------------
>
> Key: CASSANDRA-13039
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13039
> Project: Cassandra
> Issue Type: Bug
> Components: Coordination
> Reporter: Corentin Chary
> Attachments: mutation-linkedlist-block.png, profiler-snapshot.nps
>
>
> On a setup with a sustained write load of 70kQPS per node and a RF of 2 it
> looks like most of the mutation time is spend in
> OutboundTcpConnection.enqueue() -> backlog.put()
> backlog is an unbounded LinkedBlockingQueue, which means that .put() can only
> be blocking if a lock is taken. I strongly suspect that this is caused by the
> use of drainTo() in CoalescingStrategies which is causing contention for the
> producers.
> On the other hand, not using drainTo() could lead to starvation of the
> consumers.
> Possible solutions:
> - Allow multiple connections per size and per hosts in
> OutboundTcpConnectionPool
> - Switch from drainTo to multiple take()
> - Switch to ConcurrentLinkedQueue (which is lockless), also means we need
> active polling.
> Maybe a good solution would be something hybrid: a bounded
> LinkedBlockingQueue and an unbounded ConcurrentLinkedQueue. This way you get
> low latency when you don't have a lot of messages, and throughput when you do.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)