[ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536307#comment-14536307 ]
Benedict edited comment on CASSANDRA-9318 at 5/9/15 5:23 PM: ------------------------------------------------------------- bq. Where? Are you talking about the hint limit? I was, and I realise that was a mistake; I didn't fully understand the existing logic (and your proposal took me by surprise). Now that I do, I think I understand what you are proposing. There are a few problems that I see with it, though: # the cluster as a whole, especially in large clusters, can still send a _lot_ of requests to a single node # it has the opposite impact of (and likely prevents) CASSANDRA-3852, with older operations completely blocking newer ones # it might mean a lot more OE than users are used to during temporary blips, pushing problems down to clients, when the cluster is actually quite capable of coping (through hinting) #* It seems like this would in fact seriously compromise our "A" property, with any failure for any node in a token range rapidly making the entire token range unavailable for writes\* # tuning it is hard; network latencies, query processing times, and cluster size (which changes over time) will each impact it I'm wary about a feature like this, when we could simply improve our current work shedding to make it more robust (MessagingService, MUTATION stage and ExpiringMap all, effectively, shed; just not with sufficient predictability), but I think I've made all my concerns sufficiently clear so I'll leave it with you. \* At the very least we would have to first fallback to hints, rather than throwing OE, and wait for hints to saturate before throwing (AFAICT). In which case we're _in effect_ introducing "LIFO-leaky" pruning of the ExpiringMap, MS, and the receiving node's MUTATION stage, but under a new mechanism (as opposed to inline FIFO? (tbd) pruning). I don't really have anything against this, since it is functionally equivalent, although I think FIFO-pruning is preferable; having fewer pruning mechanisms is probably preferable; these mechanisms would apply more universally; and they would insulate the node from the many-to-one effect (by making the MUTATION stage itself robust to overload). was (Author: benedict): bq. Where? Are you talking about the hint limit? I was, and I realise that was a mistake; I didn't fully understand the existing logic (and your proposal took me by surprise). Now that I do, I think I understand what you are proposing. There are a few problems that I see with it, though: # the cluster as a whole, especially in large clusters, can still send a _lot_ of requests to a single node # it has the opposite impact of (and likely prevents) CASSANDRA-3852, with older operations completely blocking newer ones # it might mean a lot more OE than users are used to during temporary blips, pushing problems down to clients, when the cluster is actually quite capable of coping (through hinting) # tuning it is hard; network latencies, query processing times, and cluster size (which changes over time) will each impact it I'm wary about a feature like this, when we could simply improve our current work shedding to make it more robust (MessagingService, MUTATION stage and ExpiringMap all, effectively, shed; just not with sufficient predictability), but I think I've made all my concerns sufficiently clear so I'll leave it with you. > Bound the number of in-flight requests at the coordinator > --------------------------------------------------------- > > Key: CASSANDRA-9318 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9318 > Project: Cassandra > Issue Type: Improvement > Reporter: Ariel Weisberg > Assignee: Ariel Weisberg > Fix For: 2.1.x > > > It's possible to somewhat bound the amount of load accepted into the cluster > by bounding the number of in-flight requests and request bytes. > An implementation might do something like track the number of outstanding > bytes and requests and if it reaches a high watermark disable read on client > connections until it goes back below some low watermark. > Need to make sure that disabling read on the client connection won't > introduce other issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)