[
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964943#comment-15964943
]
Ariel Weisberg edited comment on CASSANDRA-13265 at 4/13/17 4:48 PM:
---------------------------------------------------------------------
There seem to be some build issues in various branches? Maybe because I rebased?
You should register with CircleCI so it will automatically build and run the
unit tests for you out of your repo when you commit. When you rebase there will
be a circle.yml in each branch that will automatically have it run the build.
||Code|utests|dtests||
|[2.2|https://github.com/aweisberg/cassandra/pull/new/cassandra-13265-2.2]|[utests|https://circleci.com/gh/aweisberg/cassandra/tree/cassandra-13265-2%2E2]||
|[3.0|https://github.com/aweisberg/cassandra/pull/new/cassandra-13265-3.0]|[utests|https://circleci.com/gh/aweisberg/cassandra/tree/cassandra-13265-3%2E0]||
|[3.11|https://github.com/aweisberg/cassandra/pull/new/cassandra-13265-3.11]|[utests|https://circleci.com/gh/aweisberg/cassandra/tree/cassandra-13265-3%2E11]||
|[trunk|https://github.com/aweisberg/cassandra/pull/new/cassandra-13265-trunk]|[utests|https://circleci.com/gh/aweisberg/cassandra/tree/cassandra-13265-trunk]||
was (Author: aweisberg):
There seem to be some build issues in various branches? Maybe because I rebased?
You should register with CircleCI so it will automatically build and run the
unit tests for you out of your repo when you commit. When you rebase there will
be a circle.yml in each branch that will automatically have it run the build.
||Code|utests|dtests||
|[2.2|https://github.com/aweisberg/cassandra/pull/new/cassandra-13265-2.2]|[utests|https://circleci.com/gh/aweisberg/cassandra/tree/cassandra-13265-2%2E2]||
|[3.0|https://github.com/aweisberg/cassandra/pull/new/cassandra-13265-3.0]|[utests|https://circleci.com/gh/aweisberg/cassandra/tree/cassandra-13265-3%2E0]||
|[3.11|https://github.com/aweisberg/cassandra/pull/new/cassandra-13265-3.11]|[utests|https://circleci.com/gh/aweisberg/cassandra/tree/cassandra-13265-3%2E11]||
|[trunk|https://github.com/aweisberg/cassandra/pull/new/cassandra-13265-trunk]|[utests|https://circleci.com/gh/aweisberg/cassandra/tree/cassandra-13625-trunk]||
> Expiration in OutboundTcpConnection can block the reader Thread
> ---------------------------------------------------------------
>
> Key: CASSANDRA-13265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
> Project: Cassandra
> Issue Type: Bug
> Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version
> 1.8.0_112-b15)
> Linux 3.16
> Reporter: Christian Esken
> Assignee: Christian Esken
> Fix For: 3.0.x
>
> Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz,
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to
> communicate to the other nodes. This can happen at any time, during peak load
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the
> situation and am already developing a possible fix. Here is the analysis so
> far:
> - A Threaddump in this situation showed 324 Threads in the
> OutboundTcpConnection class that want to lock the backlog queue for doing
> expiration.
> - A class histogram shows 262508 instances of
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain
> amount of queued messages, it starts thrashing itself to death. Each of the
> Thread fully locks the Queue for reading and writing by calling
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be
> starved which makes the situation even worse.
> -----
> The setup is:
> - 3-node cluster
> - replication factor 2
> - Consistency LOCAL_ONE
> - No remote DC's
> - high write throughput (100000 INSERT statements per second and more during
> peak times).
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)