[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356800#comment-15356800
]
Stefania commented on CASSANDRA-9318:
-------------------------------------
Well, I read all of the discussion above and it took perhaps longer than to
actually review the code! :) If I understood correctly, what we are trying to
achieve with this new approach, is reducing the number of dropped mutations
rather than preventing OOM but, in a way that doesn't cause OOM by invoking a
rate-limiter-acquire in a Netty shared worker pool thread, which would slow
down our ability to handle new client requests, as well as throw OEs once in a
while when a low water-mark is reached. Is this a fair description? Assuming we
can prove this in a larger scenario, I am +1 on the approach but I would think
it should go to trunk rather than 3.0? In the CCM test performed with two
nodes, what was the difference in write timeout?
In terms of testing, here is what I would suggest (cc [~cassandra-te] for any
help they may be able to provide):
* verify we can reduce the number of dropped mutations in a larger (5-10 nodes)
cluster with multiple clients writing simultaneously
* some cstar perf tests to ensure ops per second are not degraded, both read
and writes
* the dtests should be run with and without backpressure enabled
* we should do a bulk load test, for example for cqlsh COPY FROM
As for the code review, I've addressed some nits, mostly comments and trailing
spaces, directly
[here|https://github.com/stef1927/cassandra/commit/334509c10a5f4f759550fb64902aabc8c4a4f40c].
The code is really of excellent quality and very well unit tested, so I didn't
really find much, and they are mostly suggestions. I will however have another
pass next week, now that I am more familiar with the feature.
h5. conf/cassandra.yaml
I've edited the comments slightly, mostly to spell out for users why they may
need a back pressure strategy, and to put the emphasis on the default
implementation first, then say how to create new strategies. As you mentioned,
new strategies may require new state parameters anyway, so it's not clear how
easy it will be to add new strategies. You may be able to improve on my
comments further.
Should we perhaps group all settings under something like back_pressure_options?
back_pressure_timeout_override: should we call it for what it is, the
back_pressure_window_size and recommend that they increase the write timeout in
the comments, or is an override of the write timeout necessary for correctness?
h5. src/java/org/apache/cassandra/config/Config.java
I moved the creation of the default parameterized class to
RateBasedBackPressure, to remove some pollution of constants.
h5. src/java/org/apache/cassandra/net/BackPressureState.java
Should we use a dedicated package? It would require a couple of setters and
getters in the state to keep the properties non public, which I would prefer
anyway.
h5. src/java/org/apache/cassandra/net/BackPressureStrategy.java
As above, should we use a dedicated package?
h5. src/java/org/apache/cassandra/net/RateBasedBackPressure.java
OK - I've changed the code slightly to calculate backPressureWindowHalfSize in
seconds only once per call.
h5. src/java/org/apache/cassandra/utils/SystemTimeSource.java
Depending on the precision, in trunk there is an approximate time service with
a precision of 10 ms, which is faster than calling System.currentTimeMillis()
h5. src/java/org/apache/cassandra/utils/TestRateLimiter.java
Is this still used?
h5. src/java/org/apache/cassandra/utils/TestTimeSource.java
I think we need to move it somewhere with the test sources.
> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
> Issue Type: Improvement
> Components: Local Write-Read Paths, Streaming and Messaging
> Reporter: Ariel Weisberg
> Assignee: Sergio Bossa
> Attachments: backpressure.png, limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding
> bytes and requests and if it reaches a high watermark disable read on client
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't
> introduce other issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)