Hi all, (This is a rather lengthy email, and not terribly well-organized, as I haven't time to go back and rewrite the beginning of the email. At the end are some concrete things that I think need doing. If anyone wants to volunteer on any of these, that would be great. I think there is now some work that we can do that'll make the coding of darcs 2.0 much easier. And concrete discussion of concrete issues is welcome. I'd like to move more of the discussion of this stuff onto the mailing list rather than IRC, so more people will be involved, and so there'll be a bit better of a record.)
More discussion's been going on over IRC. Simon Marlow (JaffaCake) asked why we need a real tree, what if there were just one branch point. We couldn't have much of an answer, and are thinking this may be the best (and simplest) solution. One issue is that this solution doesn't handle "dead ends" at all. But since a dead end is just an already-resolved conflict, we don't want heavy machinery for it, if at all possible. One solution would simply be to store the contents of all resolved conflicts in the resolution patch itself. I lean towards this solution. If we go with this, it'll mean that the shape of a repository is as follows. An unconflicted repository will be precisely the shape of our current repositories, which is to say a sequence of patches. It'll still be a bit more tricky, because some of those patches will be hiding other patches, that will need to reappear if the resolution is unpulled, or if a patch that depends upon them is pulled into the repo. But storage-wise it's dead simple. When there's a conflict, the repository will have besides the aforementioned sequence of patches, a set of conflicting patch sequences. My leaning (although it's only one of several options) is to restrict these patch sequences to be "minimal depth". That is to say: (a) Each patch non-terminal patch in a conflicting sequnce must be depended upon the the terminal patch in that sequence, and (b) Every terminal patch must not be depended upon by any other patch in the repository. This certainly isn't the only route we could take, but it seems to me like the simplest. We could consider disallowing pulling from conflicted repositories, in which case the "minimal depth" choice wouldn't need to be set in stone. (This option--requiring resolutions--is particularly appealing if we go with the crazy idea below.) In the new scheme, I imagine an interactive "darcs resolve" which prompts the user with alternative conflicting possibilities, and allows the user to choose between them. Implementation issues: ===================== 1. Do we store "dead" patches within the resolution patch itself? I think Arjan and I lean towards this idea. 2. We're going to have to be doing considerably more rearranging and modifying of patches than in current darcs, in which a patch file is untouched after it's created in a repository. This is dangerous if someone is getting while that is happening (since we have no read locking). I think the best solution to this is to move to the "hashed inventory" idea we've discussed before, which allows an in-place update while gets and pulls are going on, provided we don't delete the old files. Or if we do delete them, the worst that happens is that the gets or pulls might end up failing due to inability to read the repo. This is a feature that can be added before we do the new conflict handling code, or in parallel. And it's a good thing to have in either case. 3. With the RepoFormat framework, we can plan ahead so that older versions of darcs will be able to interact with darcs 2.0 (the new conflict-handling version) as long as there aren't conflicts involved. The idea is that we'll define a format feature "no-new-mergers", and a repo that has that feature (which will restrict writes only, not reads) will be writeable only by darcs 1.0.9 or later (or whenever we do this), and darcs won't be able to deal with conflicts in that repo. But it'll be writeable by both darcs 1.0.9 and darcs 2.0 (as long as there are no conflicts), and will be readable by even older versions of darcs. So we can implement this now, and benefit later by allowing a certain amount of interaction between the new and old versions of darcs, which will probably be crucial in allowing a sane transition plan for our users. 4. Disallow pulling from (or to?) conflicted repos? If we do this, much of darcs' logic will remain the same. We can certainly start with this as a choice, and implement it later. 5. (The crazy idea) The new scheme is going to have to treat primitive patches as the "first-class" objects, rather than named composite patches as is currently the case. When I refer above to sequences of patches, I now mean sequences of *primitive* patches, quite different from the situation in current darcs. This is going to require quite a shift in the code, particularly in how we deal with patch names. An idea I floated before was to move composite patches up to the UI level, eliminating them entirely at the fundamental level of the darcs core code, with each primitive patch having a unique name. I've now got a new related idea, which has a very strong appeal. How about we make the "name" (patch id, or PatchInfo) of a patch no longer be part of its identity, but instead be a sort of tag that's attached to it? So that a given primitive patch could now have more than one name. This would give us "for free" the feature that patches that are identical except in name do not conflict. Primitive patches could now be members of more than one "named" patch. It'll require a restructuring, but that restructuring is required already by the whole new approach to conflicts. What would this mean? I'm not quite sure. We'd need to attach some sort of set of names to each primitive patch. It also means that we wouldn't necesarily need to attach a number to each primitive patch to give it a separate unique identifier. We still have open the danger of accidentally pulling just part of a named patch and not realizing it, since patches will be split up. One solution to that would be to include in a patch name the number of primitive patches included, and then count to make sure we got them all. Which would also need to be included if we numbered all the primitive patches. 6. Question: How will we store the inventory with these primitive patches going every which way? Perhaps we don't need to change anything? Perhaps we can still store each named patch in a single file, with an annotation that the patch is missing something. In fact, perhaps we can arrange commutation with "resolutions" to handle everything for us, so we don't need to do anything with the on-disk format. -- David Roundy _______________________________________________ darcs-devel mailing list [email protected] http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel
