Hi list, Here's that summary I promised (draft 2). Your comments most welcome!
A handful of darcs developers happened to at this year's Haskell workshop, namely David Roundy, Ian Lynagh, Jason Dagit and me, Eric Kow (did I miss anyone?). We were also lucky to have Andres Löh chatting with us, providing strategic advice in our patch theory discussions. Also making an appearance were Bjorn Bringert, Daan Leijen and Ralf/ph the physicist (hi?). The good news is that we have made progress and that we have a clearer picture what is going on... or at least that's how David sees it. Ian might beg to differ. Unfortunately, I did not understand enough of the discussion to say exactly what we have accomplished, but hopefully, there will be enough pieces of the puzzle for us to work with. Topics covered -------------- Sunday lunch and dinner - bug and patch tracking - handling Unicode correctly Wednesday darcs meetup (patch theory 2.0) - resolutions are not inverse patches - conflicts as branches - identical patches should not be conflicts Sunday lunch and dinner ----------------------- Jason, David and Eric Sunday's discussions took place at the Haskell Workshop lunch and at a restaurant near the ICFP hotel. There were loads of Haskellers that night in our group in the restaurant, but not quite within earshot. 1) Bug and patch tracking. For issue tracking, we're quite happy with Roundup. It's easy to configure, it plays well with email, its "nosy list" feature allows people to control which issues they want to follow. Something like trac tickets tracker is probably not what we want. What would be nice is if we had some means of tracking *patches*. Perhaps we could change the _darcs/prefs/email so that patches are sent to [EMAIL PROTECTED] (Eric would suggest [EMAIL PROTECTED] so that our scripts distinguish between the two). Perhaps a specially designed patch tracker written in Haskell would be useful. Or maybe Zachary's adaptation of Patch Watcher will suit our needs. To do: - Correct the problem where replies to [email protected] (instead of [EMAIL PROTECTED]) aren't being noticed by Roundup. One possible solution is for David to edit his procmail recipes so that these are autobounced to the right place. - Eric: play around with Zachary's patch watcher modifications. 2) Handling Unicode correctly is hard. I'm sure Juliusz would have something to say about this. Note, this is a mashup of our discussion notes, IRC and some emails. There are three distinct issues to deal with: metadata, file contents and filenames. i. Metadata - Right now, the metadata is stored as raw bytes in whatever encoding the user provided them. There seems to be a consensus that migrating to UTF-8 metadata would be a good thing, one day. But what makes things difficult is figuring out how to make the transition. If I understand correctly, the subissues are (a) encoding the metadata from the terminal into UTF-8 and back (b) metadata from old patches. Do we trust the user's locale? Do we have the user pass in a flag telling us how to interpret it? ii. File contents - for now, we can recognise text files encoded in any 8-bit-friendly format (e.g. ASCII, ISO-8859-1, UTF-8), basically because darcs doesn't care about the file contents. Darcs does not currently offer support for UTF-16 files. Doing so could be tricky. For example, UTF-16 comes in two flavours, a big-endian and a little-endian one, which changes how we look for newlines! (All UTF-16 files begin with a byte-order-mark that tell you which flavour is being used. I [foolishly] presume that nobody actually stores files in UTF-16BE or UTF-16LE. Otherwise, even more trickiness!) iii. File names - I don't know what we do with filenames right now. Is it the same situation with file contents? Notes - one might think "UTF-8 can represent all the Unicode code points, so why bother worrying about UTF-16?" In a word, Windows. The IRC folk tell me that Windows uses UTF-16 for everything... Yippee. In any case, encoding stuff should be approach with great prudence. I would not like to see us seriously eating somebody's data with some kind of irreversible encoding mistake. Also, as David noted, darcs is one of those non-trivial applications that has to deal with all sorts of crazy stuff that people put into files. If you want to work on this, you might consider consulting message <[EMAIL PROTECTED]> Wednesday meet-up ----------------- Jason, David, Ian, Andres and Eric (observers: Bjorn, Ralf and hello-from-Daan) The actual meetup occurred right after they announced the ICFP winners (Team Smartass won - 2-D is the language of choice). We went up to the reception and discussed things over dinner. It was kinda noisy, so I'll use that as an excuse for anything I get wrong in this writeup. 3) Elements of the new theory: i. Commutation and inverses remain central to patch theory. But they won't be the basis for dealing with conflicts. ii. Composite patches are not first class objects in darcs. Note that this is just the inner theory at work; the user interface will probably still call them patches. Anyway, the only patches that darcs will manipulate as such are the primitive patches, that is, hunk patches, add/remove files/directories. Names on patches will just be annotations of the patches. iii. Conflicts are trees. Consider the situation where patch A is followed by either B or C, which conflict) A { B ; C } David's idea is that we would introduce a new patch type, known as a "resolution patch". For example, in the above example, the user might choose to "cancel" patch C, which we write like this: A { B; (C) } What we want is for this to be effectively equivalent to A B 4) Should we treat resolutions as inverses? No. One idea is that a resolution patch could be implemented as an inverse patch. In other words, the situation: A { B ; C } Resolved as A { B ; (C) } Is actually the same as A C C^ B There was quite a bit of discussion about this. In the end, we decided that we could live without this property and that we should attend to the core problem. 5) Patches and cancellation. If I understand correctly, we're going back to the idea of creating a resolution aka cancellation patch. We're now going to be working with two different, but closely related notions: patch cancellation vs. patch death. A patch is canceled because the user says so (she creates a resolution patch that cancels it). Based on this: i. If a patch is not canceled, it is a live patch, or alive. ii. If a patch is depended on by a live patch, it too is alive. iii. Otherwise, the patch is dead. There was a bunch of discussion on this, most of which I stopped being able to follow. My notes tell me that the work will largely have something to do with the function get_common_and_uncommon. Also, the Lynagh Counterexample (demonstrating that a patch could be both dead and alive) provoked furious scribbling of numerous and incomprehensible trees. Here's the example in case anybody wants something to chew on { A B X ; A C Y ; B' C' Z } Finally, I have some other statements noted down, because they seemed important. - David: if a patch shows up once in a tree, that's all it should need to show up - David: the branching structure is a derived quantity - ??? : what do cancellation patches commute with? 6) Identical patches We also discussed the plan for how we would handle identical patches, that is, the infamous doppelganger patch problem. The basic idea is that two identical patches with different name should NOT conflict. This should make some common behaviours "just work": diff -u style patches being applied to two different repos, two people fixing whitespace issues, or obvious identical changes. Jason asked what would happen if a patch with two instances was unpulled. David believes the change would still be there, but just with one name. One thing which became clear is that you do not want to implement this by just treating two otherwise identical patches with different names as being the same patch. And the reason we want to do it this way is so that we can leave the door open for new patch types in the future. A good example to imagine is if one day, we decided to introduce an increment-token patch, so that we can say stuff like foo++ directly in darcs. Now what happens if you have TWO patches incrementing foo? Collapsing these into a single patch with two names would be changing the semantic behaviour of things. Also, some words of caution. David pointed out that what might be confusing for users is that although two identical patches will not conflict, two non-identical patches with similar changes _will_ conflict. Finally, I believe this is some stuff that Ian remains unconvinced about. Mainly, what about commutation? Another concern was that patch equality now because a O(n) operation instead of an O(1). Is patch equality an important enough operation for this to be a cause for concern? Conclusion ---------- So how did we do? I'm not too sure. A recent poll of 4 participants revealed that: - 25% (David) report being happy and confident - 25% (Ian) report _not_ being happy - 25% (Eric) are still struggling to figure out what happened - 25% abstained Well... we still papers to read, discussions to hold, beers to drink. But we'll get there! Thanks to Ian and David for comments on my first draft of this summary. Notation 2006-09-26 ------------------- Here is some notation we might use to discuss this stuff over mail. Whitespace is meaningless. A B C A sequence of patches, A then B, then C (A) Patch A has been "canceled". Perhaps (A B C) should be treated as equivalent to (A) (B) (C), but I won't do that unless we agree it's a good idea A^ Inverse of patch A (short for LaTeX's ^{-1}) A { B ; C ; D } Sequence containing A, and then a conflict between B C and D. The basic idea is that a conflict is a tree of patches! Note that you can have complicated the stuff like A { B { X ; Y } ; C ; D } Note that we can also write this down as A { B { X ; Y } ; C ; D } !! Bang! That branch is dead. For example, A { B C D !! E F !! G } I have a bunch of trees scribbled down in my notes. I can write them down for you if you guys want. Not sure what order they go in, despite my attempts at timestamping the sheets. -- Eric Kow http://www.loria.fr/~kow PGP Key ID: 08AC04F9 Merci de corriger mon français.
pgpaYGts9dhEo.pgp
Description: PGP signature
_______________________________________________ darcs-users mailing list [email protected] http://www.abridgegame.org/mailman/listinfo/darcs-users
