On Sep 24, 2008, at 10:15 , Ayende Rahien wrote:
On Wed, Sep 24, 2008 at 11:04 AM, Jan Lehnardt <[EMAIL PROTECTED]> wrote:
How do you ensure that across a cluster, all nodes will select the
same
version?
Assume that I have the following sequence of events:
- create doc A (v1)
- update doc A from V1 (v2)
- update doc A from v1 (v3) - conflict
- update doc A from v1 on separate machine (v?) - conflict
How does it get resolved?
There are two types of conflicts here. update conflicts and
replication
conflicts.
You cannot update doc A from V1 to V3.
Can I try to update a document that was already updated?
Let us say that I get v1, and update it, and try to save, while at
the same
time someone else saved.
What is going to happen? optimistic concurrency error? save and
produce
conflict?
optimistic concurrency error. you need to specify the revision you
want to
update. if it doesn't match, CouchDB won't let you update.
- server 1: create doc A(V1)
- replicate server 1 and server 2
(doc A now lives on server 1 and server 2 with V1)
- server 1: update doc A(V1) to doc A(V2a)
- server 2: update doc A(V1) to doc a(V2b)
(now there are two V2 for doc A). No problem so far)
- replicate server 1 and server 2:
- CouchDB sees that V2a and V2b are different and decides
either one to be the latest revision. Say V2a gets chosen.
- Server 1 and server 2 now both have doc A (V2a) as the
latest revision, but doc a is flagged with a _conflict attribute.
- You need to go in and resolve that by wither approving CouchDB's
automatic choice or by using a previous revision. There is no
merging
and there is no auto-conflict-resolution. Only auto-conflict-
detection.
Okay, I see how this works for 2 servers. What happen if we have
three?
So now we have V2a, v2b, v2c.
Server 1 replicate with server 2 (v2a is chosen)
Server 3 replicate with server 3 (v2? is chosen)
you mean
Server 1 replicate with server 3
or
Server 2 replicate with server 3 :-)
What is going on with server 2? On next replication, it will get
whatever
was chosen by 1 & 3 ?
In any case ,after replicating server 1 & 2 they look the same.
So when you replicate either with a new server, you fall back
to the case of V2a vs. V2b (where V2b is actually V2c).
If there are new conflicts introduced by conflict resolutions,
you start over with V2a and V2b being the first resolution and
the newly conflicting change.
to get from the code so far are:
- How is the data stored? I think that it is a binary tree on
disk, but
I
am
not following how updates to that can be safe to do so with ACID
guarantees.
Two questions that are of particular interest to me, and I
haven't been
able
Writes are serialized. Only one write can happen at a time and it
is
completely
flushed and committed to disk (2 x fsync()) before another write
comes
in.
Writes
are append-only. No data is ever overwritten. This gives us the
ACID &
MVCC
buzzcronyms :-)
Can you speak more on the actual file format? I don't think that I
understand how you can have append only with binary trees.
I have to refer you to Damien or the source for that one. :-)
Trolling the sources now, but it is pretty hard to figure it out.