[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534997#comment-14534997
 ] 

Ariel Weisberg commented on CASSANDRA-9318:
-------------------------------------------

I looked at the issues you linked and didn't come away with something that 
looks like leaky queues? Can you describe what that is? Is that shedding from 
the queues based on resources? Makes sense to me mostly to prevent the initial 
overload at processing nodes until the cluster can adapt to the disparity 
between requested capacity and actual capacity. If leaked items resulted in an 
error response that would aid in feedback to the coordinator and free up 
resources there.

Given the contract of CL=1 (or even quorum) you are right there is nothing to 
be gained by bounding the number of in-flight requests at a coordinator by not 
reading requests from clients. At CL=1 and the way I hear people think about 
availability in C* I think what you want is to get better at failing to hinting 
before the coordinator or processing node overloads. Under overload conditions 
CL=1 is basically synonymous with writing hints right?

bq.  which may leave us open to a multiplying effect of cluster overload, with 
each node dropping different requests, possibly leading to only a tiny fraction 
of requests being serviced to their required CL across the cluster. I'm not 
sure how we can best model this risk, or avoid it without notifying 
coordinators of the drop of a message, and I don't see that being delivered for 
2.1
Maybe this is a congestion control problem? If we piggybacked information in 
responses on congestion issues maybe we could make better decisions about new 
requests such as rejecting a %age or going straight to hints before resources 
have been committed across the cluster?

Once something is hinted you can trickle out the load to match the actual 
capacity of the thing being hinted. I know this conflicts with hints not being 
fast, but hints are just a queue and could be very fast. I haven't looked at 
the work being done to hints that is in progress.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to