Re: [darcs-users] How to extend a patch theory to fully commute

Ben Franksen Sun, 05 Jul 2020 11:23:19 -0700

Am 04.07.20 um 23:43 schrieb James Cook:
> On Sat, 4 Jul 2020 at 09:59, Ben Franksen <ben.frank...@online.de> wrote:
>> Am 04.07.20 um 00:10 schrieb James Cook:
>>>>> I'd say this is a feature, not a bug. If you mark the conflict between A
>>>>> and B as resolved, *without* recording a new patch, then this means you
>>>>> are satisfied with the resolution that reverts both changes. There is no
>>>>> reason why pulling C should conflict now. If, on the other hand, you are
>>>>> not satisfied with this resolution, you may want record a "proper"
>>>>> resolution, usually a mixture of A and B. This will then typically
>>>>> conflict with C.
>>>
>>> That's a great point. I'm much less worried now.
>>
>> You should be, though. I think what I wrote above is nonsense.
>>
>> [...]
> 
> I'm not sure I understand your concern right.


Ah, forget it. My concern was about cancelling inverses. I am biased by
my own theorizing about patches. I once thought that patch sequences
should form a groupoid, which requires that we must always cancel
inverses. This is apparently not what you are aiming at.

However, I do think you need to cancel inverse in /addresses/ as part of
the minimization process.

> Let me try to step
> through what I'd like to happen.
> 
> Suppose the starting context is s and patches A and B conflict and
> have ending contexts a and b.
> 
> In repository R1, the user tries to merge A and B, sees conflict
> markers, and records a conflict resolution. The conflict resolution
> consists of A^, B^, and a new patch C which replaces A and B. The
> repository now looks like this (after re-ordering a bit):
> 
> s A a A^ s B b B^ s C c

There is nothing wrong with that as long as you always keep all patches
in the sequence and never annihilate inverses.

> Now, suppose the user tries to pull from another repository R2 that
> has just patch A. Here, it is very important that R1 "remember" that
> the A\/B conflict was already resolved. Otherwise, it may treat C \/ A
> as a new conflict.
> 
> But can't we just observe that the name "A" is already present in R1,
> and therefore there's no need to pull in A? Wouldn't that ensure that
> we never again try to resolve a conflict involving A?

Of course.

If you keep "inverses" in the repo (i.e. don't cancel them), then you
are now in a situation where you want to merge (A;A^;B;B^;C)\/(A). The
common subsequence is (A) and the uncommon parts are (A^;B;B^;C)\/() and
their merge is trivially ()/\(A^;B;B^;C). So the result is A;A^;B;B^;C
i.e. equal to R1.

> I don't know how darcs figures out what patches need to be pulled,
> especially if R1 is lazy.

Don't let yourself be confused by lazy repos.

Conceptually, darcs reads the complete remote repo, figures out the
common patches, commutes all uncommon patches in both repos to the end,
and then runs the actual merge algorithm only on the remaining
(uncommon) sequences. The camp paper has a chapter with many nice
pictures where this is explained in detail.

In reality we almost never need to read the full remote repo. This is
because the order of patches in a repositories is stored in so called
inventories. An inventory is a file that lists patch names along with
the content hash of each patch. Inventories can be nested and are stored
hashed, too (i.e. the file name is equal to the content hash). This
makes it easy to compare two repos efficiently to find the latest point
at which they diverge. We only need to download remote inventories until
we find an identical one in our local repo. Since cloning preserves
inventories and operations deep inside a repo are uncommon (and also
pretty expensive), this search usually terminates quickly.

Lazy repos are missing only patches listed in closed inventories, which
means that pulling from a lazy repo normally works fine. However, it
/can/ fail if we actually need access to such a patch (e.g. in order to
commute it) and we cannot download it from any source repository listed
in the local repo (which includes the per-user cache).

> I suspect the current implementation might
> need to be extended so that a repository stores a set of all names of
> patches that have been cancelled out, i.e. the names of all patches A
> such that both A and A^ are present. But this is speculative because I
> don't know how that algorithm works.

See above. You can add such a set as an optimization but the algorithm
doesn't need it *if* you keep all patches in the sequence that
represents the state of your repo.

>> If you want to have proper inverses (for conflict resolution) and still
>> remember all of the history, you must abandon the idea that a repo is a
>> sequence of patches (or rather: an equivalence class of such sequences,
>> module permutations). So your idea of phantom patches (a patch together
>> with its tombstone) amounts to saying that a repo is not a sequence of
>> patches, but a tree (with multiple dead branches), and to merge a patch
>> means to merge it with all branches of the tree (you'll have to work out
>> what that means precisely).
> 
> I had been thinking about representing a repo as a tree.

A sequence of patches is a path in the patch graph. If you can walk
back, then what you get has the form of a tree. This is no longer true
if you can walk in circles, though. If we assume that the state of a
repository is given by a sequences of patches (modulo commutation), then
we could store the repo as a tree. This represntation uses less memory
and requires less commutation (we don't have to store or commute inverse
patches), but the algorithms become more complicated. This is an
implementation issue, so not of interest for the theory.

> I think there is a certain equivalence between trees and patch
> sequence addresses:

I am pretty sure you only get unique minimal patch addresses if you
eliminate primitive inverse pairs (see below), which means minimal
sequence addresses are necessarily linear. This does not contradict what
I wrote above: the repository state is not a sequence address, but a
plain sequence.

> given a tree, you can turn it into a patch
> sequence address by doing a depth first search (record positive
> patches as you go deeper into the tree, and negative patches when you
> return back upward). I suspect the other direction can be done too,
> after you extend (Qi) in a patch sequence address to be a cycle and
> then do some commuting, but I'm not sure.

I guess you want to avoid proper cycles, i.e. cycles other than those of
the form patches;patches^, but that's just an intuition.

> An advantage of the patch sequence address representation over the
> tree representation is that I find it easier to think about how
> commuting can modify a cycle. With the tree representation you'd want
> to push conflicts as far down the tree as possible. This should be
> related to the minimality condition for patch sequence addresses, but
> I haven't wrapped my head around it.

Well, suppose you have primitive A;(B\/C) where neither A;B nor A;C
commute. If we start with the sequence A;B;B^;C and want to minimize
that to A;C, we have no other choice but to cancel inverses. Otherwise
you'd have the situation that A;B;B^;C and A;C are not equivalent as
context addresses. (I am not using the full notation here because
primitive contexts are implicit in primitive patches, assuming all
patches are well-typed, and the sets X and Y are irrelevant here.)

Cheers
Ben

_______________________________________________
darcs-users mailing list
darcs-users@osuosl.org
https://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] How to extend a patch theory to fully commute

Reply via email to