[
https://issues.apache.org/jira/browse/CASSANDRA-15375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042734#comment-17042734
]
Benedict Elliott Smith commented on CASSANDRA-15375:
----------------------------------------------------
bq. since it's never really been tested at scale that I know of
^ This was a euphemism for “this feature has never been used, and is probably
bad”. It was implemented some time ago by DataStax, never advertised in any
way by OSS, and has never ben updated (making it either the first perfect
feature, or broken). It has perhaps been used by DataStax in their own
offerings, but never by OSS. It is unlikely (m?)any even know it exists.
Given the 4.0 networking changes, this feature no longer provides any utility
for stability. We now limit the amount of data inbound from any specific (and
all) coordinators so that we cannot be overwhelmed, and vice-versa, and this
happens instantly i.e. responsively*.
This feature, however, makes some basic implementation errors, and appears to
have several problematic semantics, particularly with vnodes, responsiveness
and choppiness (imposing three arbitrary rates of LOW, HIGH, INFINITE for all
unique combination of message recipient (probably really problematic with
vnodes, and high RF), updated once every WriteRpcTimeout - assuming the system
clock doesn’t get updated by e.g. NTP).
The only behaviour missing from internode is the ability to notify clients of
back pressure, either by propagating to the client connection or by throwing
overloaded exceptions. However this is also implemented poorly here, “applying
backpressure” by consuming a {{RequestPoolExecutor}} thread until permitted to
proceed. Thanks to CASSANDRA-15013 this will only be suboptimal, but prior to
4.0 this would have lead to really problematic cluster behaviours.
It’s worth noting that the above was all perhaps a reasonable set of trade-offs
when first implemented, though the original ticket lead to a great deal of
debate about the reasonableness of the approach (CASSANDRA-9318). However it
also suggests to me we are better removing this unused, unmaintained feature
that is no longer particularly needed, and if we have time implementing the
version that makes sense in the current context.
(*That all said, 4.0 stability at scale is part of the 4.0 testing plan, and
determining reasonable numbers for the limits is a remaining exercise - they
are almost certainly too high today to guarantee stability.)
> back pressure log line is misleading
> ------------------------------------
>
> Key: CASSANDRA-15375
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15375
> Project: Cassandra
> Issue Type: Bug
> Components: Observability/Logging
> Reporter: Jon Haddad
> Assignee: Jon Haddad
> Priority: Low
>
> This is odd:
> {{INFO [main] 2019-10-25 10:33:07,985 DatabaseDescriptor.java:803 -
> Back-pressure is disabled with strategy
> org.apache.cassandra.net.RateBasedBackPressure\{high_ratio=0.9, factor=5,
> flow=FAST}.}}
> When I saw that, I wasn't sure if back pressure was actually disabled, or if
> I was really using {{RateBasedBackPressure.}}
> This should change to output either:
> {{Back-pressure is disabled}}
> {{or}}
> {{Back-pressure is enabled with strategy
> org.apache.cassandra.net.RateBasedBackPressure\{high_ratio=0.9, factor=5,
> flow=FAST}.}}{{}}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]