[jira] [Commented] (CASSANDRA-15375) back pressure log line is misleading

Benedict Elliott Smith (Jira) Sat, 22 Feb 2020 14:37:10 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042734#comment-17042734
 ]


Benedict Elliott Smith commented on CASSANDRA-15375:
----------------------------------------------------

bq. since it's never really been tested at scale that I know of

^ This was a euphemism for “this feature has never been used, and is probably 
bad”.  It was implemented some time ago by DataStax, never advertised in any 
way by OSS, and has never ben updated (making it either the first perfect 
feature, or broken).  It has perhaps been used by DataStax in their own 
offerings, but never by OSS.  It is unlikely (m?)any even know it exists.

Given the 4.0 networking changes, this feature no longer provides any utility 
for stability.  We now limit the amount of data inbound from any specific (and 
all) coordinators so that we cannot be overwhelmed, and vice-versa, and this 
happens instantly i.e. responsively*.

This feature, however, makes some basic implementation errors, and appears to 
have several problematic semantics, particularly with vnodes, responsiveness 
and choppiness (imposing three arbitrary rates of LOW, HIGH, INFINITE for all 
unique combination of message recipient (probably really problematic with 
vnodes, and high RF), updated once every WriteRpcTimeout - assuming the system 
clock doesn’t get updated by e.g. NTP).

The only behaviour missing from internode is the ability to notify clients of 
back pressure, either by propagating to the client connection or by throwing 
overloaded exceptions.  However this is also implemented poorly here, “applying 
backpressure” by consuming a {{RequestPoolExecutor}} thread until permitted to 
proceed.  Thanks to CASSANDRA-15013 this will only be suboptimal, but prior to 
4.0 this would have lead to really problematic cluster behaviours.

It’s worth noting that the above was all perhaps a reasonable set of trade-offs 
when first implemented, though the original ticket lead to a great deal of 
debate about the reasonableness of the approach (CASSANDRA-9318).  However it 
also suggests to me we are better removing this unused, unmaintained feature 
that is no longer particularly needed, and if we have time implementing the 
version that makes sense in the current context.

(*That all said, 4.0 stability at scale is part of the 4.0 testing plan, and 
determining reasonable numbers for the limits is a remaining exercise - they 
are almost certainly too high today to guarantee stability.)


> back pressure log line is misleading
> ------------------------------------
>
>                 Key: CASSANDRA-15375
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15375
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Observability/Logging
>            Reporter: Jon Haddad
>            Assignee: Jon Haddad
>            Priority: Low
>
> This is odd:
> {{INFO [main] 2019-10-25 10:33:07,985 DatabaseDescriptor.java:803 - 
> Back-pressure is disabled with strategy 
> org.apache.cassandra.net.RateBasedBackPressure\{high_ratio=0.9, factor=5, 
> flow=FAST}.}}
> When I saw that, I wasn't sure if back pressure was actually disabled, or if 
> I was really using {{RateBasedBackPressure.}}
> This should change to output either:
> {{Back-pressure is disabled}}
> {{or}}
> {{Back-pressure is enabled with strategy 
> org.apache.cassandra.net.RateBasedBackPressure\{high_ratio=0.9, factor=5, 
> flow=FAST}.}}{{}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-15375) back pressure log line is misleading

Reply via email to