[
https://issues.apache.org/jira/browse/CASSANDRA-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425410#comment-13425410
]
Brandon Williams commented on CASSANDRA-3533:
---------------------------------------------
bq. Is there anything forcing a next attempt though, besides gossip (1/N chance
per round)?
Hmm, actually, no, I was mistaken there.
bq. But you still have things like GC-based "flapping" that can cause FD to
mark a node down over-pessimistically. So I don't think I buy that this is an
argument for not making FD more robust – since we already have to deal with "FD
is too pessimistic" for this case.
I actually don't think, at least for this example, being overly pessimistic is
an issue. On a healthy network (0.3ms ping) it takes 18-19s for the FD to mark
a host down with the default phi. If the GC flapping is so bad it can't get a
gossip change out in that time, the node probably _should_ be marked down.
bq. (Fundamentally though I don't think we'll get much mileage out of trying to
second-guess FD, so I'd rather make FD as accurate as we can. And I suspect
that "StorageProxy uses FD-supplemented-by-X and the rest of the system using
normal FD is going to cause weirdness.)
You're probably right. Let's take a step back and examine what we're trying to
solve. Node X can talk to Y, Y can talk to Z, but X and Z are partitioned and
can't communicate, but surrogate gossip traffic via Y makes them both think
they can. The fallout from this is that they'll keep attempting to send
messages (and thus connect) to each other. In practice though, from a client
perspective:
* writes will get ack'd by whichever replicas respond the fastest. Assuming
RF=3 and X being the coordinator, the fact that it wrote a local copy and Y
responded is enough for everything but ALL.
* reads will get attempted against Z from X, and will have to timeout.
Now let's look at the read scenario in a post-1.2 world. The dsnitch, after
CASSANDRA-3722, will penalize Z in X's eyes much faster (and thus prevent
dogpiling requests while waiting for rpc timeout) than pre-1.2 and quit trying
to use it (at least until the reset interval, then the process begins again.)
But this is really no different than if Z _does_ suddenly die at such a level
that the network route is a black hole (like force suspending the JVM, which is
how the dsnitch change was tested and worked well.)
So I suppose my question is, what is the problem here we still need to solve?
> TimeoutException when there is a firewall issue.
> ------------------------------------------------
>
> Key: CASSANDRA-3533
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3533
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Vijay
> Assignee: Brandon Williams
> Priority: Minor
> Fix For: 1.2
>
> Attachments: 3533.txt
>
>
> When one node in the cluster is not able to talk to the other DC/RAC due to
> firewall or network related issue (StorageProxy calls fail), and the nodes
> are NOT marked down because at least one node in the cluster can talk to the
> other DC/RAC, we get timeoutException instead of throwing a
> unavailableException.
> The problem with this:
> 1) It is hard to monitor/identify these errors.
> 2) It is hard to diffrentiate from the client if the node being bad vs a bad
> query.
> 3) when this issue happens we have to wait for at-least the RPC timeout time
> to know that the query wont succeed.
> Possible Solution: when marking a node down we might want to check if the
> node is actually alive by trying to communicate to it? So we can be sure that
> the node is actually alive.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira