Hello,

I sent the message below about a month ago, during the e-mail outage,
and didn't get any response, so I'm retrying.



----------------------------------------------------------------

I'm new to this list, although I've been using darcs for a few years.
I'm a mathematician at Barnard College, Columbia University.

I've been thinking some about the theory of patches in a more general
setting.  I apologise that I don't know very much about the internals
of darcs, but I hope I've abstracted the correct portions in what I
write below.

The basic observation is that the set of versions which are reachable
from a given set of patches, via applications and commutations, forms
a _cube complex_: an object you get by gluing together cubes of
various dimensions (including squares) along their faces.  I can give
a precise definition, but first let me sketch how it arises.

From a given repository, think about _all_ the different versions that
can be obtained by applying some subset of the patches (via the
various commutations).  Consider these versions as a set, and connect
two of them by an edge if they differ by a single patch, forming the
_version graph_, the graph of all possible versions.  Now every time
you have a version v and patches p1, p2, p1', p2' so that

  p1(p2(v)) = p2'(p1'(v))

forming a commuting set, glue in a square to the graph with boundary
the four versions involved (that's v, p2(v), p1(p2(v)), p1'(v)).
Similarly when you have 3 patches that can be commuted to apply in any
order, glue in a 3-dimensional cube, and so forth.  You end up with a
big space, the _version complex_, which I claim is the fundamental
object that darcs deals with.

The version complex can be exponentially large--if there are n
different patches, and they can be applied in any order, you'll have
2^n different possible versions you can reach.  So you want to avoid
storing all the versions in the version complex, even in summary form.
But I'd like to understand the general theory first, before worrying
about implementation.  (Perhaps, for instance, it's enough to store
only the largest cubes, cubes which aren't faces of some larger cubes,
to reduce from an exponential amount of storage.)

I don't know exactly which cube complexes you can get as a version
complex.  There are some annoying corner cases: for instance, if you
have a file with two identical sections,
schematically ABAC, and three different patches deleting, respectively,
the first A, the B, and the second A, you get the following graph of
versions by commutations:

        ABAC
        /|\
       / | \
      /  |  \
    BAC AAC ABC
     |\ / \ /|
     | /   \ |
     |/ \ / \|
     AC  BC  AC
      \  |  /
       \ | /
        \|/
         C

(Use a monospaced font for viewing.)  In this cube, every face but the
face containing AAC, AC, AC, and C commutes; this last face does not
commute, at least when it is considered independently.

But thinking about things from this point of view also makes me a bit
skeptical of some fundamentals of darcs.  Specifically, the identity
of "patches" that can be applied to a variety of different versions
seems questionable to me.  In terms of the cube complex, a "patch"
is a set of parallel edges, where we call two edges "parallel" if we
can connect them by a chain of edges so that two adjacent edges in the
chain are opposite edges of some square in the complex.

The identity of patches is particularly questionable if you could have
a version complex like the one in the attached PostScript picture.  In
that picture, if you follow the parallel edges around you see that
there are two ways of applying "the same" patch to the same version,
which would be bad!

But this can actually occur.  Suppose we have in our repository the
cycle of versions schematically shown below.  Note that the first
version v0 has two identical blocks of text, both labelled 'A'.

v0: ABCAD
 v0 -> v1: delete the first 'A'
v1: BCAD
 v1 -> v2: delete 'B'
v2: CAD
 v2 -> v3: add 'B' between 'A' and 'D'
v3: CABD
 v3 -> v4: delete 'C'
v4: ABD
 v4 -> v5: add 'C' between 'B' and 'D'
v5: ABCD
 v0 -> v5: delete the second 'A'

You may object that the version graph should never have cycles in it.
I think you're forced to get cycles when the user performs a merge of
conflicting versions; what is a merged version but a new version that
can be related by an elementary patch to each of the two previous
ones?

Now add the patch

p: v0 -> v0': change the first 'A' to 'a'
  v0': aBCAD

If you commute this patch 'p' around the cycle of versions, you'll see
that you end up at a different version

  v0'': ABCaD

essentially as shown in the attached diagram: the inner ring has the
versions listed above, while the outer ring has the same versions with
'A' changed to 'a'.  When there are two 'A's, everything is
ambiguous...

This seems quite troublesome to me; I don't know how to resolve it.

Peace,
        Dylan Thurston

Attachment: square.eps
Description: PostScript document

Attachment: signature.asc
Description: Digital signature

_______________________________________________
darcs-devel mailing list
[email protected]
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel

Reply via email to