Hi All the recent discussions here gave me a much better picture of what kind of features we should strive for when we add in-repo branches. But perhaps I should not (yet) talk about branches. Let us talk instead about "patch sequences".
The way Darcs works today is that it knows of exactly one sequence of patches per repo. [As an aside: It is absolutely necessary for Darcs to know in which sequence patches are applied, since what we internally refer to are not patches in the abstract sense as the user sees them, but rather their concrete representations. And --in contrast to abstract patches-- these apply only in a much more narrowly defined context.] Internally, this sequence is broken up into segments, where each segment can refer to a parent segment (except the last one at the start of the repo; the segments are internally called "inventories"). This segmentation is done for efficiency, mostly, so from a user perspective we can pretty much ignore it (with a few exceptions, see below). It should perhaps be noted at this point that the sequence does not /contain/ the patches, it merely references them. The patches themselves are stored independently. Like a list of pointers/references to elements in C or Java. What happens when we "destroy" a patch, using any one of: amend, obliterate, unrecord, or rebase suspend? Well, nothing is actually deleted or destroyed, we merely forget about the existence of the original patch, since we no longer have a reference to it. But nothing forces us to "forget" these patches! Instead, each time we do something "destructive", we could add a new reference to some internal data structure that is very similar, if not identical, to that which currently gives us the ordered sequence of patches in our repo. In particular, since we already have the ability to refer to a parent segment, all we need to do is to split the current segment and create two new ones that both refer back to the old segment. Here is a picture. We start out with the sequence A<B<---C<D<* Here, A is the "oldest" patch and D is the "latest" one; the * denotes the head of the sequence, which at some point in the past has been split into A<B and C<D. The arrow point backwards here, like list pointers. Now let's amend patch D, giving us a new patch E. Today, what we get is A<B<---C<E<* D where D is decoupled from our sequence and can no longer be referred to; it might as well not exist (of course, some other repo may still have a reachable copy). What I propose to do is change this to A<B<---C<---E<* ^ \ ---D<x where the small letter 'x' denotes a new "head" (of another sequence). I come to the problem of naming this sequence/head later. For now, let us assume we have some name ("x") for it and see what properties this has. The old patch D is obviously not "applied" in our repo, only the sequence of patches starting at * is. Now, as we all know and love, Darcs can --sometimes-- re-order the sequence of patches. Let us see where this leads us. For instance, assume we want to get rid of a patch in the past, say C. To do this, Darcs must commute this patch to the head (*). Assume this suceeds, i.e. E does not depend on C. We get two new patches E'<C', where E and E' as well as C and C' both represent the same "abstract patch". A<B<---E'<---C'<* We can now drop C' and get A<B<---E'<* ^ \ ---C'<y But this picture is incomplete as yet. What about D? We must decide to which parent sequence we attach it, since its original parent sequence no longer exists. We face a similar situation today when we want to do a destructive operation with a patch behind a tag. If we insist on doing it, we must destroy the tag, too, since it depends on everything that comes before it. This is why Darcs chooses to start a new segment whenever it sees a "clean" tag at the top ("clean" means that the tag actually depends on all patches preceding it): re-ordering patches forward across a tag is impossible. In our situation things are not as inflexible, though. Remember what we did when commuting C<E: we created two new patches. The old ones are still there and while we cannot refer to them in our main/current sequence, we /can/ reference them in another sequence. So this is what we could do: ---C'<y / v A<B<---E'<* ^ \ ---C<D<x Looks nice, doesn't it? Note how the sequence of patches at y remains exactly the same, only the segmentation has changed. We need to do only a minimal amount of work. Still the above solution has its problems and they have to do with naming. For a satisfying UX we need to have a way to refer, in a unique manner, to a head such as the "x" and "y" I have been using. We could of course create /some/ unique ID, each time we create a branch. I think this is just too awkward in practice: if I have two identical clones of a project and obliterate the same patch in both repos independently, then I get two differently named branches. If I pull patches between them with some extra option that also pulls branches/heads, I now have two identical but differently named branches in a single repo. This is more or less what git users have to cope with. Git has features on top of that: you can tell it that local branch "x" tracks remote branch "y", that is, they become (locally) identified (or something like that). I would very much like to avoid this, it is difficult to understand and requires too much initial setup and maintenance. Can we solve this without inventing new names? What could work, in principle, is to let Darcs list our sequences one by one and we choose whichever we want to refer to interactively. But having to do that every time would be even more awkward than the git solution. One obvious idea is to name branches by referring to their top-most patch (hash). This requires that no two branches have the same patch on top so we can uniquely refer to the head by referring to this patch. Is there a way to ensure that? The scheme as presented so far does not have this property. For instance, in the picture above obliterate D in head x, then x starts with C and y with C', both of which are "identical" as far as the user is concerned, that is, they have the same hash. We could even arrive at situations where the whole patch sequence is identical. Let us review the earlier example. We have A<B<---C<---E<* ^ \ ---D<x and want to commute C<E, so we can obliterate C. Suppose C also commutes with D, say D''<C''. In this case we can just drop C'' from x and arrive at ---D''<x / v A<B<---E'<* ^ \ ---C'<y and our invariant holds. But what if we cannot commute C<D, that is, D depends on C? Well, it turns out there is yet another alternative that is even more elegant: We throw away the C' and instead create a new branch for C exactly where it was before: ---C<D<x / v A<B<---E'<* ^ \ ---C<y and now we can join the common "tail" of "x" and "y": A<B<---E'<* ^ \ ---C<y ^ \ ---D''<x This works because for the purpose of keeping a reference to a patch it doesn't matter which representative we chose. If we obliterate the same patch in two clones of a repo, and then pull from one to the other, we will still get only one branch! I would like to point you now to a paper containing a proof that we can always choose to branch in such a way that if a patch is contained in two different branches, then it can be commuted back into the common "tail" of both. Unfortunately this paper does not exist yet. However, the good news is that we can be pretty sure the theorem holds. We have a test-suite where properties like this are tested using QuickCheck. And we are /already/ relying on it because without this fundamental property operations such as pull, push, or apply would not work at all. Conclusion ---------- If we choose to add in-repo branches to Darcs, we do /not/ need to be concerned with branch identifiers! It is enough to able to name patches to uniquely identify a branch. It doesn't even have to be the top-most patch, just any patch in the initial segment of a branch. I think this will result in a UI that is at least as easy to use as that of Mercurial and still consistent and efficient as with git. We can and should allow to attach symbolic "human readable" names to branches, but they will have no fundamental relevance. It's a bit like using 'darcs pull -p patchname', you can use it and it is convenient but if in doubt there is always the hash to uniquely identify a single patch. A spontaneous idea I just had is this: if we add a feature to give symbolic names ("aliases") to patches, then these could double as branch names. We could maintain these associations in a simple text file with one line per alias and add a pref 'aliasfile' so we can keep the file under version control, quite similar to 'boringfile'. I'll write more about what I think a good UI to in-repo branches could be in some subsequent message. Cheers Ben _______________________________________________ darcs-users mailing list darcs-users@osuosl.org https://lists.osuosl.org/mailman/listinfo/darcs-users