[ 
https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536846#comment-14536846
 ] 

Jonathan Shook edited comment on CASSANDRA-9318 at 5/10/15 12:32 AM:
---------------------------------------------------------------------

I would venture that a solid load shedding system may improve the degenerate 
overloading case, but it is not the preferred method for dealing with 
overloading for most users. The concept of back-pressure is more squarely what 
people expect, for better or worse.

Here is what I think reasonable users want to see, with some variations:
1) The system performs with stability, up to the workload that it is able to 
handle with stability.
2a) Once it reaches that limit, it starts pushing back in terms of how quickly 
it accepts new work. This means that it simply blocks the operations or 
submissions of new requests with some useful bound that is determined by the 
system. It does not yet have to shed load. It does not yet have to give 
exceptions. This is a very reasonable expectation for most users. This is what 
they expect. Load shedding is a term of art which does not change the users' 
expectations.
2b) Once it reaches that limit, it starts throwing OE to the client. It does 
not have to shed load yet. (Perhaps this exception or something like it can be 
thrown _before_ load shedding occurs.) This is a very reasonable expectation 
for users who are savvy enough to do active load management at the client 
level. It may have to start writing hints, but if you are writing hints merely 
because of load, this might not be the best justification for having the hints 
system kick in. To me this is inherently a convenient remedy for the wrong 
problem, even if it works well. Yes, hints are there as a general mechanism, 
but it does not solve the problem of needing to know when the system is being 
pushed beyond capacity and how to handle it proactively. You could also say 
that hints actively hurt capacity when you need them most sometimes. They are 
expensive to process given the current implementation, and will always be "load 
shifting" even at theoretical best. Still we need them for node availability 
concerns, although we should be careful not to use them as a crutch for general 
capacity issues.
2c) Once it reaches that limit, it starts backlogging (without a helpful 
signature of such in the responses, maybe BackloggingException with some queue 
estimate). This is a very reasonable expectation for users who are savvy enough 
to manage their peak and valley workloads in a sensible way. Sometimes you 
actually want to tax the ingest and flush side of the system for a bit before 
allowing it to switch modes and catch up with compaction. The fact that C* can 
do this is an interesting capability, but those who want backpressure will not 
easily see it that way.
2d) If the system is being pushed beyond its capacity, then it may have to shed 
load. This should only happen if the user has decided that they want to be 
responsible for such and have pushed the system beyond the reasonable limit 
without paying attention to the indications in 2a, 2b, and 2c. In the current 
system, this decision is already made for them. They have no choice.

In a more optimistic world, users would get near optimal performance for a well 
tuned workload with back-pressure active throughout the system, or something 
very much like it. We could call it a different kind of scheduler, different 
queue management methods, or whatever. 
As long as the user could prioritize stability at some bounded load over 
possible instability at an over-saturating load, I think they would in most 
cases. Like I said, they really don't have this choice right now. I know this 
is not trivial. We can't remove the need to make sane judgments about sizing 
and configuration. We might be able to, however, make the system ramp more 
predictably up to saturation, and behave more reasonably at that level.

Order of precedence, How to designate a mode of operation, or any other 
concerns aren't really addressed here. I just provided the examples above as 
types of behaviors which are nuanced yet perfectly valid for different types of 
system designs. The real point here is that there is not a single overall 
QoS/capacity/back-pressure behavior which is going to be acceptable to all 
users. Still, we need to ensure stability under saturating load where possible. 
I would like to think that with CASSANDRA-8099 that we can start discussing 
some of the client-facing back-pressure ideas more earnestly. I do believe that 
these ideas are all compatible ideas on a spectrum of behavior. They are not 
mutually exclusive from a design/implementation perspective. It's possible that 
they could be specified per operation, even, with some traffic yield to others 
due to client policies. For example, a lower priority client could yield when 
it knows the cluster is approaching saturation (Responses could contain a % 
loading level estimate), while higher priority data stream could keep writing 
data as long as the backlogging queue level was less than a certain amount. ( 
perhaps a score which factors in the time delay to the oldest planned but 
uncompacted data.. )

We can come up with methods to improve the reliable and responsive capacity of 
the system even with some internal load management. If the first cut ends up 
being sub-optimal, then we can measure it against non-bounded workload tests 
and strive to close the gap. If it is implemented in a way that can support 
multiple usage scenarios, as described above, then such a limitation might be 
"unlimited", "bounded at level ___", or "bounded by inline resource 
management".. But in any case would be controllable by some users/admin, 
client.. If we could ultimately give the categories of users above the ability 
to enable the various modes, then the 2a) scenario would be perfectly desirable 
for many users already even if the back-pressure logic only gave you 70% of the 
effective system capacity. Once testing shows that performance with active 
back-pressure to the client is close enough to the unbounded workloads, it 
could be enabled by default.

Summary: We still need reasonable back-pressure support throughout the system 
and eventually to the client. Features like this that can be a stepping stone 
towards such are still needed. The most perfect load shedding and hinting 
systems will still not be a sufficient replacement for back-pressure and 
capacity management.

I know this comment contains lots of tangents to the original ticket. As well, 
it doesn't speak specifically to the implementation details or ideas directly. 
If we should take this comment and move it to another ticket, let me know. I 
thought the emphasis towards back-pressure mechanisms was appropriate, but it 
did get a bit wordy.



was (Author: jshook):
I would venture that a solid load shedding system may improve the degenerate 
overloading case, but it is not the preferred method for dealing with 
overloading for most users. The concept of back-pressure is more squarely what 
people expect, for better or worse.

Here is what I think reasonable users want to see, with some variations:
1) The system performs with stability, up to the workload that it is able to 
handle with stability.
2a) Once it reaches that limit, it starts pushing back in terms of how quickly 
it accepts new work. This means that it simply blocks the operations or 
submissions of new requests with some useful bound that is determined by the 
system. It does not yet have to shed load. It does not yet have to give 
exceptions. This is a very reasonable expectation for most users. This is what 
they expect. Load shedding is a term of art which does not change the users' 
expectations.
2b) Once it reaches that limit, it starts throwing OE to the client. It does 
not have to shed load yet. (Perhaps this exception or something like it can be 
thrown _before_ load shedding occurs.) This is a very reasonable expectation 
for users who are savvy enough to do active load management at the client 
level. It may have to start writing hints, but if you are writing hints merely 
because of load, this might not be the best justification for having the hints 
system kick in. To me this is inherently a convenient remedy for the wrong 
problem, even if it works well. Yes, hints are there as a general mechanism, 
but it does not solve the problem of needing to know when the system is being 
pushed beyond capacity and how to handle it proactively. You could also say 
that hints actively hurt capacity when you need them most sometimes. They are 
expensive to process given the current implementation, and will always be "load 
shifting" even at theoretical best. Still we need them for node availability 
concerns, although we should be careful not to use them as a crutch for general 
capacity issues.
2c) Once it reaches that limit, it starts backlogging (without a helpful 
signature of such in the responses, maybe BackloggingException with some queue 
estimate). This is a very reasonable expectation for users who are savvy enough 
to manage their peak and valley workloads in a sensible way. Sometimes you 
actually want to tax the ingest and flush side of the system for a bit before 
allowing it to switch modes and catch up with compaction. The fact that C* can 
do this is an interesting capability, but those who want backpressure will not 
easily see it that way.
2d) If the system is being pushed beyond its capacity, then it may have to shed 
load. This should only happen if the user has decided that they want to be 
responsible for such and have pushed the system beyond the reasonable limit 
without paying attention to the indications in 2a, 2b, and 2c. In the current 
system, this decision is already made for them. They have no choice.

In a more optimistic world, users would get near optimal performance for a well 
tuned workload with back-pressure active throughout the system, or something 
very much like it. We could call it a different kind of scheduler, different 
queue management methods, or whatever. 
As long as the user could prioritize stability at some bounded load over 
possible instability at an over-saturating load, I think they would in most 
cases. Like I said, they really don't have this choice right now. I know this 
is not trivial. We can't remove the need to make sane judgments about sizing 
and configuration. We might be able to, however, make the system ramp more 
predictably up to saturation, and behave more reasonable at that level.

Order of precedence, How to designate a mode of operation, or any other 
concerns aren't really addressed here. I just provided the examples above as 
types of behaviors which are nuanced yet perfectly valid for different types of 
system designs. The real point here is that there is not a single overall 
QoS/capacity/back-pressure behavior which is going to be acceptable to all 
users. Still, we need to ensure stability under saturating load where possible. 
I would like to think that with CASSANDRA-8099 that we can start discussing 
some of the client-facing back-pressure ideas more earnestly. I do believe that 
these ideas are all compatible ideas on a spectrum of behavior. They are not 
mutually exclusive from a design/implementation perspective. It's possible that 
they could be specified per operation, even, with some traffic yield to others 
due to client policies. For example, a lower priority client could yield when 
it knows the cluster is approaching saturation (Responses could contain a % 
loading level estimate), while higher priority data stream could keep writing 
data as long as the backlogging queue level was less than a certain amount. ( 
perhaps a score which factors in the time delay to the oldest planned but 
uncompacted data.. )

We can come up with methods to improve the reliable and responsive capacity of 
the system even with some internal load management. If the first cut ends up 
being sub-optimal, then we can measure it against non-bounded workload tests 
and strive to close the gap. If it is implemented in a way that can support 
multiple usage scenarios, as described above, then such a limitation might be 
"unlimited", "bounded at level ___", or "bounded by inline resource 
management".. But in any case would be controllable by some users/admin, 
client.. If we could ultimately give the categories of users above the ability 
to enable the various modes, then the 2a) scenario would be perfectly desirable 
for many users already even if the back-pressure logic only gave you 70% of the 
effective system capacity. Once testing shows that performance with active 
back-pressure to the client is close enough to the unbounded workloads, it 
could be enabled by default.

Summary: We still need reasonable back-pressure support throughout the system 
and eventually to the client. Features like this that can be a stepping stone 
towards such are still needed. The most perfect load shedding and hinting 
systems will still not be a sufficient replacement for back-pressure and 
capacity management.

I know this comment contains lots of tangents to the original ticket. As well, 
it doesn't speak specifically to the implementation details or ideas directly. 
If we should take this comment and move it to another ticket, let me know. I 
thought the emphasis towards back-pressure mechanisms was appropriate, but it 
did get a bit wordy.


> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster 
> by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding 
> bytes and requests and if it reaches a high watermark disable read on client 
> connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't 
> introduce other issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to