[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360784#comment-15360784
]
Stefania commented on CASSANDRA-9318:
-------------------------------------
bq. For that specific test I've got no client timeouts at all, as I wrote at
ONE.
Sorry I should have been clearer, I meant what were the
{{write_request_timeout_in_ms}} and {{back_pressure_timeout_override}} yaml
settings?
bq. Agreed with all your points. I'll see what I can do, but any help/pointers
will be very appreciated.
We can do the following:
bq. verify we can reduce the number of dropped mutations in a larger (5-10
nodes) cluster with multiple clients writing simultaneously
I will ask for help to the TEs, more details to follow.
bq. some cstar perf tests to ensure ops per second are not degraded, both read
and writes
We can launch a comparison test [here|http://cstar.datastax.com], 30M rows
should be enough. I can launch it for you if you don't have an account.
bq. the dtests should be run with and without backpressure enabled
This can be done by temporarily changing cassandra.yaml on your branch and then
launching the dtests.
bq. we should do a bulk load test, for example for cqlsh COPY FROM
I can take care of this. I don't expect problems because COPY FROM should
contact the replicas directly, it's just a box I want to tick. Importing 5 to
10M rows with 3 nodes should be sufficient.
bq. Please send me a PR and I'll incorporate those in my branch
I couldn't create a PR, for some reason sbtourist/cassandra wasn't in the base
fork list. I've attached a patch to this ticket,
[^9318-3.0-nits-trailing-spaces.patch].
bq. I find the current layout effective and simple enough, but I'll not object
if you want to push those under a common "container" option.
The encryption options are what I was aiming at, but it's true that for
everything else we have a flat layout, so let's leave it as it is.
bq. I don't like much that name either, as it doesn't convey very well the
(double) meaning; making the back-pressure window the same as the write timeout
is not strictly necessary, but it makes the algorithm behave better in terms of
reducing dropped mutations as it gives replica more time to process its backlog
after the rate is reduced. Let me think about that a bit more, but I'd like to
avoid requiring the user to increase the write timeout manually, as again, it
reduces the effectiveness of the algorithm.
I'll let you think about it. Maybe a boolean property that is true by default
and that clearly indicates that the timeout is overridden, although this
complicates things somewhat.
bq. Sure I can switch to that on trunk, if you think it's worth
performance-wise (I can write a JMH test if there isn't one already).
The precision is only 10 milliseconds, if this is acceptable it would be
interesting to see what the difference in performance is.
bq. It is not used in any unit tests code, but it is used in my manual byteman
tests, and unfortunately I need it on the C* classpath; is that a problem to
keep it?
Sorry I missed the byteman imports and helper. Let's just move it to the test
source folder and add a comment.
--
The rest of the CR points are fine.
One thing we did not confirm is whether you are happy committing this only to
trunk or whether you need this in 3.0. Strictly speaking 3.0 accepts only bug
fixes, not new features. However, this is an optional feature that solves a
problem (dropped mutations) and that is disabled by default, so we have a case
for an exception.
> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
> Issue Type: Improvement
> Components: Local Write-Read Paths, Streaming and Messaging
> Reporter: Ariel Weisberg
> Assignee: Sergio Bossa
> Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png,
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding
> bytes and requests and if it reaches a high watermark disable read on client
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't
> introduce other issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)