[jira] [Issue Comment Edited] (CASSANDRA-3722) Send Hints to Dynamic Snitch when Compaction or repair is going on for a node.

Peter Schuller (Issue Comment Edited) (JIRA) Mon, 27 Feb 2012 11:39:43 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217445#comment-13217445
 ]


Peter Schuller edited comment on CASSANDRA-3722 at 2/27/12 7:38 PM:
--------------------------------------------------------------------

I'm -0 on the original bit of this ticket, but +1 on more generic changes that 
covers the original use case as good if not better anyway. I think that instead 
of trying to predict exactly the behavior of some particular event like 
compaction, we should just be better at actually responding to what is actually 
going on:

* We have CASSANDRA-2540 which can help avoid blocking uselessly on a dropped 
or slow request even if we haven't had the opportunity to react to overall 
behavior yet (I have a partial patch that breaks read repair, I haven't had 
time to finish it).
* Taking into account the number of outstanding requests is IMO a necessity. 
There is plenty of precedent for anyone who wants that (least used connections 
policies in various LB:s), but more importantly it would so clearly help in 
several situations, including:
** Sudden GC pause of a node
** Sudden death of a node
** Sudden page cache eviction and slowness of a node, before snitching figures 
it out
** Constantly overloaded node; even with the dynsnitch it would improve the 
situation as the number of requests affected by a dynsnitch reset is lessened
** Packet loss/hiccup/whatever across DC:s

There is some potential for foot shooting in the sense that if a node is broken 
in a way that it responds with incorrect data, but responds faster than anyone 
else, it will tend to "swallow" all the traffic. But honestly, that feels like 
a minor concern to me based on what I've seen actually happen in production 
clusters. If we ever start sending non-successes back over inter-node RPC, this 
would change however.

My only major concern is potential performance impacts of keeping track of the 
number of outstanding requests, but if that *does* become a problem one can 
make it probabilistic - have N % of all requests be tracked. Less impact, but 
also less immediate response to what's happening.

This will also have the side-effect of mitigating sudden bursts of promotion 
into old-gen if we combine it with pro-actively dropping read-repair messages 
for nodes that are overloaded (effectively prioritizing data reads), hence 
helping for CASSANDRA-3853.

{quote}
Should we T (send additional requests which are not part of the normal 
operations) the requests until the other node recovers?
{quote}

In the absence of read repair, we'd have to do speculative reads as Stu has 
previously noted. With read repair turned on, this is not an issue because the 
node will still receive requests and eventually warm up. Only with read repair 
turned off do we not send requests to more than the first N of endpoints, with 
N being what is required by CL.

Semi-relatedly, I think it would be a good idea to make the proximity sorting 
probabilistic in nature so that we don't do a binary flip back and fourth 
between who gets data vs. digest reads or who doesn't get reads at all. That 
might mitigate this problem, but not help fundamentally since the rate of 
warm-up would decrease with a node being slow.

I do want to make this point though: *Every single production cluster* I have 
ever been involved with so far, has been such that you basically never want to 
turn read repair off. Not because of read repair itself, but because of the 
traffic it generates. Having nodes not receive traffic is extremely dangerous 
under most circumstances as it leaves nodes cold, only to suddenly explode and 
cause timeouts and other bad behavior as soon as e.g. some neighbor goes down 
and it suddenly starts taking traffic. This is an easy way to make production 
clusters fall over. If your workload is entirely in memory or otherwise not 
reliant on caching the problem is much less pronounced, but even then I would 
generally recommend that you keep it turned on if only because your nodes will 
have to be able to take the additional load *anyway* if you are to survive 
other nodes in the neighborhood going down. It just makes clusters much more 
easy to reason about.
                
      was (Author: scode):
    I'm -0 on the original bit of this ticket, but +1 on more generic changes 
that covers the original use case as good if not better anyway. I think that 
instead of trying to predict exactly the behavior of some particular event like 
compaction, we should just be better at actually responding to what is actually 
going on:

* We have CASSANDRA-2540 which can help avoid blocking uselessly on a dropped 
or slow request even if we haven't had the opportunity to react to overall 
behavior yet (I have a partial patch that breaks read repair, I haven't had 
time to finish it).
* Taking into account the number of outstanding requests is IMO a necessity. 
There is plenty of precedent for anyone who wants that (least used connections 
policies in various LB:s), but more importantly it would so clearly help in 
several situations, including:
** Sudden GC pause of a node
** Sudden death of a node
** Sudden page cache eviction and slowness of a node, before snitching figures 
it out
** Constantly overloaded node; even with the dynsnitch it would improve the 
situation as the number of requests affected by a dynsnitch reset is lessened
** Packet loss/hiccup/whatever across DC:s

There is some potential for foot shooting in the sense that if a node is broken 
in a way that it responds with incorrect data, but responds faster than anyone 
else, it will tend to "swallow" all the traffic. But honestly, that feels like 
a minor concern to me based on what I've seen actually happen in production 
clusters. If we ever start sending non-successes back over inter-node RPC, this 
would change however.

My only major concern is potential performance impacts of keeping track of the 
number of outstanding requests, but if that *does* become a problem one can 
make it probabilistic - have N % of all requests be tracked. Less impact, but 
also less immediate response to what's happening.

This will also have the side-effect of mitigating sudden bursts of promotion 
into old-gen if we combine it with pro-actively dropping read-repair messages 
for nodes that are overloaded (effectively prioritizing data reads), hence 
helping for CASSANDRA-3853.

{code}
Should we T (send additional requests which are not part of the normal 
operations) the requests until the other node recovers?
{code}

In the absence of read repair, we'd have to do speculative reads as Stu has 
previously noted. With read repair turned on, this is not an issue because the 
node will still receive requests and eventually warm up. Only with read repair 
turned off do we not send requests to more than the first N of endpoints, with 
N being what is required by CL.

Semi-relatedly, I think it would be a good idea to make the proximity sorting 
probabilistic in nature so that we don't do a binary flip back and fourth 
between who gets data vs. digest reads or who doesn't get reads at all. That 
might mitigate this problem, but not help fundamentally since the rate of 
warm-up would decrease with a node being slow.

I do want to make this point though: *Every single production cluster* I have 
ever been involved with so far, has been such that you basically never want to 
turn read repair off. Not because of read repair itself, but because of the 
traffic it generates. Having nodes not receive traffic is extremely dangerous 
under most circumstances as it leaves nodes cold, only to suddenly explode and 
cause timeouts and other bad behavior as soon as e.g. some neighbor goes down 
and it suddenly starts taking traffic. This is an easy way to make production 
clusters fall over. If your workload is entirely in memory or otherwise not 
reliant on caching the problem is much less pronounced, but even then I would 
generally recommend that you keep it turned on if only because your nodes will 
have to be able to take the additional load *anyway* if you are to survive 
other nodes in the neighborhood going down. It just makes clusters much more 
easy to reason about.
                  
> Send Hints to Dynamic Snitch when Compaction or repair is going on for a node.
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3722
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3722
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1.0
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>
> Currently Dynamic snitch looks at the latency for figuring out which node 
> will be better serving the requests, this works great but there is a part of 
> the traffic sent to collect this data... There is also a window when Snitch 
> doesn't know about some major event which are going to happen on the node 
> (Node which is going to receive the data request).
> It would be great if we can send some sort hints to the Snitch so they can 
> score based on known events causing higher latencies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-3722) Send Hints to Dynamic Snitch when Compaction or repair is going on for a node.

Reply via email to