[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373624#comment-15373624
]
Sergio Bossa commented on CASSANDRA-9318:
-----------------------------------------
[~Stefania],
bq. if a message expires before it is sent, we consider this negatively for
that replica, since we increment the outgoing rate but not the incoming rate
when the callback expires, and still it may have nothing to do with the replica
if the message was not sent, it may be due to the coordinator dealing with too
many messages.
Right, but there isn't much we can do without way more invasive changes.
Anyway, I don't think that's actually a problem, as if the coordinator is
overloaded we'll end up generating too many hints and fail with
{{OverloadedException}} (this time with its original meaning), so we should be
covered.
bq. I also observe that if a replica has a low rate, then we may block when
acquiring the limiter, and this will indirectly throttle for all following
replicas, even if they were ready to receive mutations sooner.
See my answer at the end.
bq. AbstractWriteResponseHandler sets the start time in the constructor, so the
time spent acquiring a rate limiter for slow replicas counts towards the total
time before the coordinator throws a write timeout exception.
See my answer at the end.
bq. SP.sendToHintedEndpoints(), we should apply backpressure only if the
destination is alive.
I know, I'm holding on these changes until we settle on a plan for the whole
write path (in terms of what to do with CL, the exception to be thrown etc.).
bq. Let's use UnavailableException since WriteFailureException indicates a
non-timeout failure when processing a mutation, and so it is not appropriate
for this case. For protocol V4 we cannot change UnavailableException, but for
V5 we should add a new parameter to it. At the moment it contains
<cl><required><alive>, we should add the number of overloaded replicas, so that
drivers can treat the
two cases differently.
Does it mean we should advance the protocol version in this issue, or delegate
to a new issue?
bq. Marking messages as throttled would let the replica know if backpressure
was enabled, that's true, but it also makes the existing mechanism even more
complex.
How so? In implementation terms, it should be literally as easy as:
1) Add a byte parameter to {{MessageOut}}.
2) Read such byte parameter from {{MessageIn}} and eventually skip dropping it
replica-side.
3) If possible (didn't check it), when a "late" response is received on the
coordinator, try to cancel the related hint.
Do you see any complexity I'm missing there?
bq. dropping mutations that have been in the queue for longer that the RPC
write timeout is done not only to shed load on the replica, but also to avoid
wasting resources to perform a mutation when the coordinator has already
returned a timeout exception to the client.
This is very true and that's why I said it's a bit of a wild idea. Obviously,
that is true outside of back-pressure, as even now it is possible to return a
write timeout to clients and still have some or all mutations applied. In the
end, it might be good to optionally enable such behaviour, as the advantage
would be increased consistency at the expense of more resource consumption,
which is a tradeoff some users might want to make, but to be clear, I'm not
strictly lobbying to implement it, just trying to reason about pros and cons.
bq. I still have concerns regarding additional write timeout exceptions and
whether an overloaded or slow replica can slow everything down.
These are valid concerns of course, and given similar concerns from [~jbellis],
I'm working on some changes to avoid write timeouts due to healthy replicas
unnaturally throttled by unhealthy ones, and depending on [~jbellis] answer to
my last comment above, maybe only actually back-pressure if the CL is not met.
Stay tuned.
> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
> Issue Type: Improvement
> Components: Local Write-Read Paths, Streaming and Messaging
> Reporter: Ariel Weisberg
> Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png,
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding
> bytes and requests and if it reaches a high watermark disable read on client
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't
> introduce other issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)