On Jul 5, 2018, at 11:10 AM, Digimer <[email protected]> wrote:
> 
> There is no effective benefit to 3+ nodes

That depends on your requirements and definition of terms.

For one very useful (and thus popular) definition of “success” in distributed 
computing, you need at least four nodes, and that only allows you to catch a 
single failed node at a time:

    http://lamport.azurewebsites.net/pubs/reaching.pdf

See section 4 for the mathematical proof.  This level of redundancy is 
necessary to achieve what is called Byzantine fault tolerance, which is where 
we cannot trust all of the nodes implicitly, and thus must achieve consistency 
by consensus and cross-checks.

A less stringent success criterion assumes a less malign environment and allows 
for a minimum of three nodes to prevent system failure in the face of a single 
fault at a time:

    http://lamport.azurewebsites.net/pubs/lower-bound.pdf

With your suggested 2-node setup, you only get replication, which is not the 
same thing as distributed fault tolerance:

    http://paxos.systems/how.html

To see the distinction, consider what happens if one node in a mirrored pair 
goes down and then the one remaining up is somehow corrupted before the downed 
node comes back up: the second will be corrupted as soon as it comes back up 
because it mirrors the corruption.

The n >= 2f+1 criterion prevents this problem in the face of normal hardware or 
software failure, with a minimum of 3 nodes required to reliably detect and 
cope with a single failure at a time.

The more stringent n > 2m+1 criterion prevents this problem in the face of 
nodes that may be actively hostile, with 4 nodes being required to reliably 
catch a single traitorous node.

That terminology comes from one of the most important papers in distributed 
computing, “The Byzantine Generals Problem,” co-authored by Leslie Lamport, who 
was also involved in all of the above work:

    
https://www.microsoft.com/en-us/research/uploads/prod/2016/12/The-Byzantine-Generals-Problem.pdf

And who is Leslie Lamport?  He is the 2013 Turing Award winner, which is as 
close to the Nobel Prize as you can get with work in pure computer science:

    https://en.wikipedia.org/wiki/Leslie_Lamport

So, if you want to argue with the papers above, you’d better bring some pretty 
good arguments. :)

If his current affiliation with Microsoft bothers you, realize that he did all 
of this work prior to joining Microsoft in 2001.

Also, he’s also the “La” in LaTeX. :)

> (quorum is arguably helpful, but proper stonith, which you need anyway, makes 
> it mostly a moot point).

STONITH is orthogonal to the concepts expressed in the CAP theorem:

   https://en.wikipedia.org/wiki/CAP_theorem

It is mathematically impossible to escape the restrictions of the CAP theorem.  
I’ve seen people try, but it inevitably amounts to Humpty Dumpty logic: "When I 
use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what 
I choose it to mean — neither more nor less.”  You can win as many arguments as 
you like if you get to redefine the foundational terms upon which the argument 
is based to suit your needs at different stages of the argument.

With that understanding, we can say that a 2-node setup results in one of the 
following consequences:


OPTION 1: Give up Consistency (C)
---------------------------------
If you give up consistency, you get an AP system:

- While one node in a mirrored pair is down, the other is up giving you 
availability (A), assuming all clients can normally see both nodes.

- Since you give up on consistency (C), you can put the two nodes in different 
data centers to gain partition tolerance (P) over the data paths within and 
between those data centers. This only gets you availability as well if both 
data centers can be seen by all clients.

In other words, you can tolerate either a loss of a single node or a partition 
between them, but not both at the same time, and while either condition 
applies, you cannot guarantee consistency in query replies.

This mode of operation is sometimes called “eventual consistency,” meaning that 
it’s expected that there will be periods of time where multiple nodes are 
online but they don’t all respond to identical queries with the same data.


OPTION 2: Require Consistency
-----------------------------
In order to get consistency, a 2-node system behaves like it is a bare-minimum 
quorum in a faux 3-node cluster, with the third node always MIA.  As soon as 
one of the two “remaining” nodes goes down, you have two bad choices:

1. Continue to treat it as a cluster.  Since you have no second node to check 
your transactions against to maintain consistency, the cluster must stop write 
operations until the downed node is restored.  Read-only operations can 
continue, if we’re willing to accept the risk of the remaining system somehow 
becoming corrupted without going down entirely.

2. Split the cluster so that the remaining node is a standalone instance, which 
means you  have no distributed system any more, so the CAP theorem no longer 
applies.  You might continue to have C-as-in-ACID, if your software stack is 
ACID-compliant in the first place, but you lose C-as-in-CAP until the second 
system comes back up and is restored to full operation.

We can have a CA cluster in one of the two senses above.  Only the first is a 
true cluster, though, and it’s not useful in many applications, since it blocks 
writes.

CP has no useful meaning in this situation, since a network partition will 
split the cluster, so that you don’t actually have partition tolerance.  This 
is why the minimum number of nodes is 3: so that a quorum remains on one side 
of the partition!


Regardless of which path you pick above, you lose something relative to an n >= 
2f+1 or an n > 2m+1 cluster.  TANSTAAFL.

> Lastly, with 2x 2-node, you could lose two nodes
> (one per cluster) and still be operational. If you lose 2 nodes of a
> four node cluster, you're offline.

I don’t see how both statements can be true under the same CAP design 
restrictions, as laid out above.  I think if you analyze the configuration of 
your different scenarios, you’ll find that they’re not in the same CAP regime.

Lacking more information, I’m guessing this 2x2 configuration you’re thinking 
of is AP, and your 4-min-3 configuration is CP.

A 4-node AP configuration could lose 2 nodes and keep running, as long as all 
clients can see at least one of the remaining nodes, since you’ve given up on 
continual consistency.
_______________________________________________
CentOS mailing list
[email protected]
https://lists.centos.org/mailman/listinfo/centos

Reply via email to