[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

Benedict (JIRA) Mon, 29 Jun 2015 02:32:20 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605345#comment-14605345
 ]


Benedict commented on CASSANDRA-9318:
-------------------------------------

bq. I think you're \[Benedict\] overselling how scary it is to stop reading new 
requests until we can free up some memory from MS.

The problem is that freeing up memory can be constrained by one or a handful of 
_dead_ nodes. We can not only stop accepting work, but significantly reduce 
cluster throughput as a result of a *single* timed-out node. I'm not 
overselling anything, although I may have a different risk analysis than you do.

Take a simple mathematical thought experiment: we have a four node cluster 
(pretty common), with RF=3, serving 100kop/s per coordinator; these operations 
in memory occupy around 2K as a Mutation (again, pretty typical). Ordinary 
round-trip time is 10ms (also, pretty typical). 

So, under normal load we would see around 2Mb of data maintained for our 
queries in-flight across the cluster. But now one node goes down. This node is 
a peer for 3/4 of all writes to the cluster, so we see 150Mb/s of data 
accumulate in each coordinator. Our limit is probably no more than 300Mb 
(probably lower). Our timeout is 10s. So we now have 8s during which nothing 
can be done, across the cluster, due to one node's death. After that 8s has 
elapsed, we get another flurry. Then another 8s of nothing. Even with a CL of 
ONE.

This really is fundamentally opposed to the whole idea of Cassandra, and I 
cannot emphasize how much I am against it except as a literal last resort when 
all other strategies have failed.

bq. Hinting is better than leaving things in an unknown state but it's not 
something we should opt users into if we have a better option, since it 
basically turns the write into CL.ANY.

I was under the impression we had moved to talking about ACK'd writes. I'm not 
suggesting we ack with success to the handler. 

What we do with unack'd writes is actually less important, and we have a much 
freer reign with. We could throw OE. We could block, as you suggest, since 
these should be more evenly distributed.

However I would prefer we do both, i.e., when we run out of room in the 
coordinator, we should look to see if there are any nodes that have well in 
excess of their fair share of entries waiting for a response. Let's call these 
nodes N

# if N=0, we block consumption of new writes, as you propose.
# otherwise, we first evict those that have been ACK'd to the client and can be 
safely hinted (and hint them)
# if this isn't enough, we evict handlers that, if all N were to fail, would 
break the CL we are waiting on, and we throw OE

step 3 is necessary both for CL.ALL, and the scenario where 2 failing nodes 
have significant overlap


> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator

Reply via email to