[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

Stefania (JIRA) Sun, 03 Jul 2016 20:21:22 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360784#comment-15360784
 ]


Stefania commented on CASSANDRA-9318:
-------------------------------------

bq. For that specific test I've got no client timeouts at all, as I wrote at 
ONE.

Sorry I should have been clearer, I meant what were the 
{{write_request_timeout_in_ms}} and {{back_pressure_timeout_override}} yaml 
settings?

bq. Agreed with all your points. I'll see what I can do, but any help/pointers 
will be very appreciated.

We can do the following:

bq. verify we can reduce the number of dropped mutations in a larger (5-10 
nodes) cluster with multiple clients writing simultaneously

I will ask for help to the TEs, more details to follow.

bq. some cstar perf tests to ensure ops per second are not degraded, both read 
and writes
    
We can launch a comparison test [here|http://cstar.datastax.com], 30M rows 
should be enough. I can launch it for you if you don't have an account.

bq. the dtests should be run with and without backpressure enabled
    
This can be done by temporarily changing cassandra.yaml on your branch and then 
launching the dtests.

bq. we should do a bulk load test, for example for cqlsh COPY FROM

I can take care of this. I don't expect problems because COPY FROM should 
contact the replicas directly, it's just a box I want to tick. Importing 5 to 
10M rows with 3 nodes should be sufficient.

bq. Please send me a PR and I'll incorporate those in my branch

I couldn't create a PR, for some reason sbtourist/cassandra wasn't in the base 
fork list. I've attached a patch to this ticket, 
[^9318-3.0-nits-trailing-spaces.patch].

bq. I find the current layout effective and simple enough, but I'll not object 
if you want to push those under a common "container" option.

The encryption options are what I was aiming at, but it's true that for 
everything else we have a flat layout, so let's leave it as it is.

bq. I don't like much that name either, as it doesn't convey very well the 
(double) meaning; making the back-pressure window the same as the write timeout 
is not strictly necessary, but it makes the algorithm behave better in terms of 
reducing dropped mutations as it gives replica more time to process its backlog 
after the rate is reduced. Let me think about that a bit more, but I'd like to 
avoid requiring the user to increase the write timeout manually, as again, it 
reduces the effectiveness of the algorithm.

I'll let you think about it. Maybe a boolean property that is true by default 
and that clearly indicates that the timeout is overridden, although this 
complicates things somewhat.

bq. Sure I can switch to that on trunk, if you think it's worth 
performance-wise (I can write a JMH test if there isn't one already).

The precision is only 10 milliseconds, if this is acceptable it would be 
interesting to see what the difference in performance is.

bq. It is not used in any unit tests code, but it is used in my manual byteman 
tests, and unfortunately I need it on the C* classpath; is that a problem to 
keep it?

Sorry I missed the byteman imports and helper. Let's just move it to the test 
source folder and add a comment. 

--

The rest of the CR points are fine. 

One thing we did not confirm is whether you are happy committing this only to 
trunk or whether you need this in 3.0. Strictly speaking 3.0 accepts only bug 
fixes, not new features. However, this is an optional feature that solves a 
problem (dropped mutations) and that is disabled by default, so we have a case 
for an exception.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Sergio Bossa
>         Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, 
> limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

Reply via email to