On Jul 5, 2018, at 11:10 AM, Digimer <[email protected]> wrote:
>
> There is no effective benefit to 3+ nodes
That depends on your requirements and definition of terms.
For one very useful (and thus popular) definition of “success” in distributed
computing, you need at least four nodes, and that only allows you to catch a
single failed node at a time:
http://lamport.azurewebsites.net/pubs/reaching.pdf
See section 4 for the mathematical proof. This level of redundancy is
necessary to achieve what is called Byzantine fault tolerance, which is where
we cannot trust all of the nodes implicitly, and thus must achieve consistency
by consensus and cross-checks.
A less stringent success criterion assumes a less malign environment and allows
for a minimum of three nodes to prevent system failure in the face of a single
fault at a time:
http://lamport.azurewebsites.net/pubs/lower-bound.pdf
With your suggested 2-node setup, you only get replication, which is not the
same thing as distributed fault tolerance:
http://paxos.systems/how.html
To see the distinction, consider what happens if one node in a mirrored pair
goes down and then the one remaining up is somehow corrupted before the downed
node comes back up: the second will be corrupted as soon as it comes back up
because it mirrors the corruption.
The n >= 2f+1 criterion prevents this problem in the face of normal hardware or
software failure, with a minimum of 3 nodes required to reliably detect and
cope with a single failure at a time.
The more stringent n > 2m+1 criterion prevents this problem in the face of
nodes that may be actively hostile, with 4 nodes being required to reliably
catch a single traitorous node.
That terminology comes from one of the most important papers in distributed
computing, “The Byzantine Generals Problem,” co-authored by Leslie Lamport, who
was also involved in all of the above work:
https://www.microsoft.com/en-us/research/uploads/prod/2016/12/The-Byzantine-Generals-Problem.pdf
And who is Leslie Lamport? He is the 2013 Turing Award winner, which is as
close to the Nobel Prize as you can get with work in pure computer science:
https://en.wikipedia.org/wiki/Leslie_Lamport
So, if you want to argue with the papers above, you’d better bring some pretty
good arguments. :)
If his current affiliation with Microsoft bothers you, realize that he did all
of this work prior to joining Microsoft in 2001.
Also, he’s also the “La” in LaTeX. :)
> (quorum is arguably helpful, but proper stonith, which you need anyway, makes
> it mostly a moot point).
STONITH is orthogonal to the concepts expressed in the CAP theorem:
https://en.wikipedia.org/wiki/CAP_theorem
It is mathematically impossible to escape the restrictions of the CAP theorem.
I’ve seen people try, but it inevitably amounts to Humpty Dumpty logic: "When I
use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what
I choose it to mean — neither more nor less.” You can win as many arguments as
you like if you get to redefine the foundational terms upon which the argument
is based to suit your needs at different stages of the argument.
With that understanding, we can say that a 2-node setup results in one of the
following consequences:
OPTION 1: Give up Consistency (C)
---------------------------------
If you give up consistency, you get an AP system:
- While one node in a mirrored pair is down, the other is up giving you
availability (A), assuming all clients can normally see both nodes.
- Since you give up on consistency (C), you can put the two nodes in different
data centers to gain partition tolerance (P) over the data paths within and
between those data centers. This only gets you availability as well if both
data centers can be seen by all clients.
In other words, you can tolerate either a loss of a single node or a partition
between them, but not both at the same time, and while either condition
applies, you cannot guarantee consistency in query replies.
This mode of operation is sometimes called “eventual consistency,” meaning that
it’s expected that there will be periods of time where multiple nodes are
online but they don’t all respond to identical queries with the same data.
OPTION 2: Require Consistency
-----------------------------
In order to get consistency, a 2-node system behaves like it is a bare-minimum
quorum in a faux 3-node cluster, with the third node always MIA. As soon as
one of the two “remaining” nodes goes down, you have two bad choices:
1. Continue to treat it as a cluster. Since you have no second node to check
your transactions against to maintain consistency, the cluster must stop write
operations until the downed node is restored. Read-only operations can
continue, if we’re willing to accept the risk of the remaining system somehow
becoming corrupted without going down entirely.
2. Split the cluster so that the remaining node is a standalone instance, which
means you have no distributed system any more, so the CAP theorem no longer
applies. You might continue to have C-as-in-ACID, if your software stack is
ACID-compliant in the first place, but you lose C-as-in-CAP until the second
system comes back up and is restored to full operation.
We can have a CA cluster in one of the two senses above. Only the first is a
true cluster, though, and it’s not useful in many applications, since it blocks
writes.
CP has no useful meaning in this situation, since a network partition will
split the cluster, so that you don’t actually have partition tolerance. This
is why the minimum number of nodes is 3: so that a quorum remains on one side
of the partition!
Regardless of which path you pick above, you lose something relative to an n >=
2f+1 or an n > 2m+1 cluster. TANSTAAFL.
> Lastly, with 2x 2-node, you could lose two nodes
> (one per cluster) and still be operational. If you lose 2 nodes of a
> four node cluster, you're offline.
I don’t see how both statements can be true under the same CAP design
restrictions, as laid out above. I think if you analyze the configuration of
your different scenarios, you’ll find that they’re not in the same CAP regime.
Lacking more information, I’m guessing this 2x2 configuration you’re thinking
of is AP, and your 4-min-3 configuration is CP.
A 4-node AP configuration could lose 2 nodes and keep running, as long as all
clients can see at least one of the remaining nodes, since you’ve given up on
continual consistency.
_______________________________________________
CentOS mailing list
[email protected]
https://lists.centos.org/mailman/listinfo/centos