Congratulations, you've just hit on my biggest pet peeve in
distributed systems discussions. Sorry if this gets a little hot. :)

On Tue, Sep 15, 2015 at 5:38 AM, Owen Synge <osy...@suse.com> wrote:
> On Mon, 14 Sep 2015 13:57:26 -0700
> Gregory Farnum <gfar...@redhat.com> wrote:
>
>> The OSD is supposed to stay down if any of the networks are missing.
>> Ceph is a CP system in CAP parlance; there's no such thing as a CA
>> system. ;)
>
> I know I am being fussy, but within my team your email was sited that
> you cannot consider ceph as a CA system. Hence I make my argument in
> public so I can be humbled in public.
>
> Just to clarify your opinion I site
>
> http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
>
> suggests:
>
> <quote>
> The CAP theorem states that any networked shared-data system can have
> at most two of three desirable properties.
>
> * consistency (C) equivalent to having a single up-to-date copy of
>     the data;
> * high availability (A) of that data (for updates)
> * tolerance to network partitions (P).
> </quote>
>
> So I dispute that a CA system cannot exist.

Right, you can create a system that assumes no partitions and thus is
always consistent and available in the absence of partitions. The
problem is that partitions *do* exist and happen. The stereotypical
example is that a simple hard power-off on one server is
indistinguishable from a (very small) partition to the other nodes.
Even leaving aside stuff like that, networks partition, or undergo
partition-like events (huge packet loss over some link). When that
happens, your system is going to...do something. If you don't want
that something to be widespread data corruption, it will be designed
to handle partitions.

There are no real "CA" systems in the world.

> I think you are too absolute even in interpretation of this vague
> theory. A further quote from the author of said theorem from the same
> article:
>
> <quote>
> The "2 of 3" formulation was always misleading because it tended to
> oversimplify the tensions among properties.
> </quote>

Right. This is the cause of a lot of problems for students of
distributed systems. Another quote from that article:

>CAP prohibits only a tiny part of the design space: perfect availability and 
>consistency in the presence of partitions, which are rare.

Lots of users forget that the CAP theorem is very precise, and that
precision is important. Some quick-and-dirty (but precise enough)
definitions:
Available: any request received by a well-behaved node of the system
will receive a (correct!) response (within some bounded amount of
time)
Consistent: All nodes in the system agree on the status of a
particular piece of state. (Eg, that an object is at version 12345 and
not 12344.)
Partition-tolerant: the system continues to function correctly in the
presence of message loss between some set of nodes.

> As I understand it:
>
> Ceph as a cluster always provides "Consistency". (or else you found a
> bug)
>
> If a ceph cluster is operating it will always provide acknowledgment
> (it may block) to the client if the operation has succeeded
> or failed hence provides "availability".

This is the part you're missing: blocking a request is not allowed
under the CAP theorem's definition of availability. If a PG might have
been updated by a set of nodes which are now partitioned away, we
can't respond to the client request (despite it being a valid,
well-behaved request) and so the system is not big-A Available.
Now, we are little-a available for *other* kinds of work. The cluster
keeps going and will process requests for all the state which it knows
it is authoritative for. But we do not satisfy the availability
criteria of the CAP theorem. This is part of the wide design space
which CAP does not demonstrate is impossible.

> if a ceph cluster is partitioned, only one partition will continue
> operation, hence you cannot consider the system "partition" tolerant
> as multiple parts of the system cannot operate when partitioned.

Nope, that's not what partition-tolerant means in the context of the
CAP theorem. This shortcut — in which we treat one side of the
partition as a misbehaving node — is pretty common, but what you're
citing here is actually associated with the "Available" side of the
spectrum.

The proof of the CAP theorem is relatively elegant and can be
summarized as: if a node disappears/is partitioned, you must either
ignore any updates it handled (sacrificing consistency) or refuse to
answer until the node reappears (sacrificing availability). No
combination of clever data replication or distribution can eliminate
the choice a system has to make when it goes to look at data and
discovers the data isn't there.

Now, that discussion of Ceph's classification under the CAP theorem
obviously leaves out lots of stuff: Ceph is small-a available in that
it takes a lot more than one disk failure to render data inaccessible!
Much has been made of this design space, with many vendors and
developers pretending that this means they've somehow beaten CAP. But
when faced with requests for data that are not currently accessible,
Ceph chooses to block and remain Consistent over making up a value and
remaining Available; every distributed system must make that choice
one way or another. Most CP systems endeavor to remain available for
as long as possible; AP systems expend varying amounts of effort to be
consistent up until they're faced with either blocking or making up a
value.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to