[ 
https://issues.apache.org/jira/browse/CASSANDRA-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15908545#comment-15908545
 ] 

Ariel Weisberg commented on CASSANDRA-13265:
--------------------------------------------

The right way to do it is create a branch for all the versions where this is 
going to be fixed. Start at 2.2, merge to 3.0, merge to 3.11, then merge to 
trunk. 

You can get away with one field. Check the next expiration time, CAS it to 
{{Long.MAX_VALUE}}, then when you are done store the next expiration time in 
it. Doing it with two fields also works. I wouldn't bother changing it.

If you set the default value via a property it will work fine. It will set it 
once when it loads the class at startup and then overwrite it with YAML 
contents or JMX invocations. Generally we do set the default value in config 
via assignment. Doing it via property gives yet another way to set the value, 
but it's the least important. It's more useful for things which aren't in the 
YAML. Adding YAML properties adds a bit of boiler plate.

* [A smaller value could potentially expire messages slightly sooner at the 
expense of more CPU time and queue contention while iterating the backlog of 
messages.
* [You shouldn't need the check for null? Usually we "just" make sure its not 
null and skip the 
boilerplate.|https://github.com/apache/cassandra/pull/95/files#diff-a8a9935b164cd23da473fd45784fd1ddR1973]
* [Avoid unrelated whitespace changes.| 
https://github.com/apache/cassandra/pull/95/files#diff-a8a9935b164cd23da473fd45784fd1ddL2034]
*  [I still think it's a good idea to avoid hard coding this kind of value so 
operators have options without 
recompiling.|https://github.com/apache/cassandra/pull/95/files#diff-c7ef124561c4cde1c906f28ad3883a88R139]
* Fun fact. You don't need {{backlogNextExpirationTime}} to be volatile. You 
can piggyback on {{backlogExpirationActive}} to get the desired effects from 
the Java memory model. A store to {{backlogExpirationActive}} makes a prior 
stores (by the current thread) to {{backlogNextExpirationTime}} visible. A read 
of {{backlogExpirationActive}} would make prior stores to 
{{backlogNextExpirationTimeVisible}} by the last writer to 
{{backlogExpirationActive}} visible. The volatile read is close to free so I 
wouldn't change it just so it's not sensitive to the order the fields are 
accessed in.
* [Breaking out the uber bike shedding this could be 
maybeExpireMessages.|https://github.com/apache/cassandra/pull/95/files#diff-c7ef124561c4cde1c906f28ad3883a88R600]
* [Swap the order of these two stores so it doesn't do extra 
expirations.|https://github.com/apache/cassandra/pull/95/files#diff-c7ef124561c4cde1c906f28ad3883a88R636]
* [Using a boxed integer makes it a bit confusing because now everyone has to 
know how null is handled. What's the diff between null and 0? Better to let 0 
be disabled and not have 
null.|https://github.com/apache/cassandra/pull/95/files#diff-097eb77c5d9d80a48e7547fbb81822caR62]
* [This is not quite correct you can't count drainCount as dropped because some 
of the drained messages may have been sent during 
iteration.|https://github.com/apache/cassandra/pull/95/files#diff-c7ef124561c4cde1c906f28ad3883a88R270]

> Expiration in OutboundTcpConnection can block the reader Thread
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-13265
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13265
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra 3.0.9
> Java HotSpot(TM) 64-Bit Server VM version 25.112-b15 (Java version 
> 1.8.0_112-b15)
> Linux 3.16
>            Reporter: Christian Esken
>            Assignee: Christian Esken
>             Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>         Attachments: cassandra.pb-cache4-dus.2017-02-17-19-36-26.chist.xz, 
> cassandra.pb-cache4-dus.2017-02-17-19-36-26.td.xz
>
>
> I observed that sometimes a single node in a Cassandra cluster fails to 
> communicate to the other nodes. This can happen at any time, during peak load 
> or low load. Restarting that single node from the cluster fixes the issue.
> Before going in to details, I want to state that I have analyzed the 
> situation and am already developing a possible fix. Here is the analysis so 
> far:
> - A Threaddump in this situation showed  324 Threads in the 
> OutboundTcpConnection class that want to lock the backlog queue for doing 
> expiration.
> - A class histogram shows 262508 instances of 
> OutboundTcpConnection$QueuedMessage.
> What is the effect of it? As soon as the Cassandra node has reached a certain 
> amount of queued messages, it starts thrashing itself to death. Each of the 
> Thread fully locks the Queue for reading and writing by calling 
> iterator.next(), making the situation worse and worse.
> - Writing: Only after 262508 locking operation it can progress with actually 
> writing to the Queue.
> - Reading: Is also blocked, as 324 Threads try to do iterator.next(), and 
> fully lock the Queue
> This means: Writing blocks the Queue for reading, and readers might even be 
> starved which makes the situation even worse.
> -----
> The setup is:
>  - 3-node cluster
>  - replication factor 2
>  - Consistency LOCAL_ONE
>  - No remote DC's
>  - high write throughput (100000 INSERT statements per second and more during 
> peak times).
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to