[
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534997#comment-14534997
]
Ariel Weisberg commented on CASSANDRA-9318:
-------------------------------------------
I looked at the issues you linked and didn't come away with something that
looks like leaky queues? Can you describe what that is? Is that shedding from
the queues based on resources? Makes sense to me mostly to prevent the initial
overload at processing nodes until the cluster can adapt to the disparity
between requested capacity and actual capacity. If leaked items resulted in an
error response that would aid in feedback to the coordinator and free up
resources there.
Given the contract of CL=1 (or even quorum) you are right there is nothing to
be gained by bounding the number of in-flight requests at a coordinator by not
reading requests from clients. At CL=1 and the way I hear people think about
availability in C* I think what you want is to get better at failing to hinting
before the coordinator or processing node overloads. Under overload conditions
CL=1 is basically synonymous with writing hints right?
bq. which may leave us open to a multiplying effect of cluster overload, with
each node dropping different requests, possibly leading to only a tiny fraction
of requests being serviced to their required CL across the cluster. I'm not
sure how we can best model this risk, or avoid it without notifying
coordinators of the drop of a message, and I don't see that being delivered for
2.1
Maybe this is a congestion control problem? If we piggybacked information in
responses on congestion issues maybe we could make better decisions about new
requests such as rejecting a %age or going straight to hints before resources
have been committed across the cluster?
Once something is hinted you can trickle out the load to match the actual
capacity of the thing being hinted. I know this conflicts with hints not being
fast, but hints are just a queue and could be very fast. I haven't looked at
the work being done to hints that is in progress.
> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
> Key: CASSANDRA-9318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
> Fix For: 2.1.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding
> bytes and requests and if it reaches a high watermark disable read on client
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't
> introduce other issues.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)