[
https://issues.apache.org/jira/browse/CASSANDRA-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217445#comment-13217445
]
Peter Schuller edited comment on CASSANDRA-3722 at 2/27/12 7:38 PM:
--------------------------------------------------------------------
I'm -0 on the original bit of this ticket, but +1 on more generic changes that
covers the original use case as good if not better anyway. I think that instead
of trying to predict exactly the behavior of some particular event like
compaction, we should just be better at actually responding to what is actually
going on:
* We have CASSANDRA-2540 which can help avoid blocking uselessly on a dropped
or slow request even if we haven't had the opportunity to react to overall
behavior yet (I have a partial patch that breaks read repair, I haven't had
time to finish it).
* Taking into account the number of outstanding requests is IMO a necessity.
There is plenty of precedent for anyone who wants that (least used connections
policies in various LB:s), but more importantly it would so clearly help in
several situations, including:
** Sudden GC pause of a node
** Sudden death of a node
** Sudden page cache eviction and slowness of a node, before snitching figures
it out
** Constantly overloaded node; even with the dynsnitch it would improve the
situation as the number of requests affected by a dynsnitch reset is lessened
** Packet loss/hiccup/whatever across DC:s
There is some potential for foot shooting in the sense that if a node is broken
in a way that it responds with incorrect data, but responds faster than anyone
else, it will tend to "swallow" all the traffic. But honestly, that feels like
a minor concern to me based on what I've seen actually happen in production
clusters. If we ever start sending non-successes back over inter-node RPC, this
would change however.
My only major concern is potential performance impacts of keeping track of the
number of outstanding requests, but if that *does* become a problem one can
make it probabilistic - have N % of all requests be tracked. Less impact, but
also less immediate response to what's happening.
This will also have the side-effect of mitigating sudden bursts of promotion
into old-gen if we combine it with pro-actively dropping read-repair messages
for nodes that are overloaded (effectively prioritizing data reads), hence
helping for CASSANDRA-3853.
{quote}
Should we T (send additional requests which are not part of the normal
operations) the requests until the other node recovers?
{quote}
In the absence of read repair, we'd have to do speculative reads as Stu has
previously noted. With read repair turned on, this is not an issue because the
node will still receive requests and eventually warm up. Only with read repair
turned off do we not send requests to more than the first N of endpoints, with
N being what is required by CL.
Semi-relatedly, I think it would be a good idea to make the proximity sorting
probabilistic in nature so that we don't do a binary flip back and fourth
between who gets data vs. digest reads or who doesn't get reads at all. That
might mitigate this problem, but not help fundamentally since the rate of
warm-up would decrease with a node being slow.
I do want to make this point though: *Every single production cluster* I have
ever been involved with so far, has been such that you basically never want to
turn read repair off. Not because of read repair itself, but because of the
traffic it generates. Having nodes not receive traffic is extremely dangerous
under most circumstances as it leaves nodes cold, only to suddenly explode and
cause timeouts and other bad behavior as soon as e.g. some neighbor goes down
and it suddenly starts taking traffic. This is an easy way to make production
clusters fall over. If your workload is entirely in memory or otherwise not
reliant on caching the problem is much less pronounced, but even then I would
generally recommend that you keep it turned on if only because your nodes will
have to be able to take the additional load *anyway* if you are to survive
other nodes in the neighborhood going down. It just makes clusters much more
easy to reason about.
was (Author: scode):
I'm -0 on the original bit of this ticket, but +1 on more generic changes
that covers the original use case as good if not better anyway. I think that
instead of trying to predict exactly the behavior of some particular event like
compaction, we should just be better at actually responding to what is actually
going on:
* We have CASSANDRA-2540 which can help avoid blocking uselessly on a dropped
or slow request even if we haven't had the opportunity to react to overall
behavior yet (I have a partial patch that breaks read repair, I haven't had
time to finish it).
* Taking into account the number of outstanding requests is IMO a necessity.
There is plenty of precedent for anyone who wants that (least used connections
policies in various LB:s), but more importantly it would so clearly help in
several situations, including:
** Sudden GC pause of a node
** Sudden death of a node
** Sudden page cache eviction and slowness of a node, before snitching figures
it out
** Constantly overloaded node; even with the dynsnitch it would improve the
situation as the number of requests affected by a dynsnitch reset is lessened
** Packet loss/hiccup/whatever across DC:s
There is some potential for foot shooting in the sense that if a node is broken
in a way that it responds with incorrect data, but responds faster than anyone
else, it will tend to "swallow" all the traffic. But honestly, that feels like
a minor concern to me based on what I've seen actually happen in production
clusters. If we ever start sending non-successes back over inter-node RPC, this
would change however.
My only major concern is potential performance impacts of keeping track of the
number of outstanding requests, but if that *does* become a problem one can
make it probabilistic - have N % of all requests be tracked. Less impact, but
also less immediate response to what's happening.
This will also have the side-effect of mitigating sudden bursts of promotion
into old-gen if we combine it with pro-actively dropping read-repair messages
for nodes that are overloaded (effectively prioritizing data reads), hence
helping for CASSANDRA-3853.
{code}
Should we T (send additional requests which are not part of the normal
operations) the requests until the other node recovers?
{code}
In the absence of read repair, we'd have to do speculative reads as Stu has
previously noted. With read repair turned on, this is not an issue because the
node will still receive requests and eventually warm up. Only with read repair
turned off do we not send requests to more than the first N of endpoints, with
N being what is required by CL.
Semi-relatedly, I think it would be a good idea to make the proximity sorting
probabilistic in nature so that we don't do a binary flip back and fourth
between who gets data vs. digest reads or who doesn't get reads at all. That
might mitigate this problem, but not help fundamentally since the rate of
warm-up would decrease with a node being slow.
I do want to make this point though: *Every single production cluster* I have
ever been involved with so far, has been such that you basically never want to
turn read repair off. Not because of read repair itself, but because of the
traffic it generates. Having nodes not receive traffic is extremely dangerous
under most circumstances as it leaves nodes cold, only to suddenly explode and
cause timeouts and other bad behavior as soon as e.g. some neighbor goes down
and it suddenly starts taking traffic. This is an easy way to make production
clusters fall over. If your workload is entirely in memory or otherwise not
reliant on caching the problem is much less pronounced, but even then I would
generally recommend that you keep it turned on if only because your nodes will
have to be able to take the additional load *anyway* if you are to survive
other nodes in the neighborhood going down. It just makes clusters much more
easy to reason about.
> Send Hints to Dynamic Snitch when Compaction or repair is going on for a node.
> ------------------------------------------------------------------------------
>
> Key: CASSANDRA-3722
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3722
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 1.1.0
> Reporter: Vijay
> Assignee: Vijay
> Priority: Minor
>
> Currently Dynamic snitch looks at the latency for figuring out which node
> will be better serving the requests, this works great but there is a part of
> the traffic sent to collect this data... There is also a window when Snitch
> doesn't know about some major event which are going to happen on the node
> (Node which is going to receive the data request).
> It would be great if we can send some sort hints to the Snitch so they can
> score based on known events causing higher latencies.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira