[darcs-users] ICFP meetup notes (2006-09-17 and 2006-09-20)

Eric Y. Kow Thu, 28 Sep 2006 14:08:00 -0700

Hi list,

Here's that summary I promised (draft 2).  Your comments most welcome!


A handful of darcs developers happened to at this year's Haskell
workshop, namely David Roundy, Ian Lynagh, Jason Dagit and me, Eric Kow
(did I miss anyone?).  We were also lucky to have Andres Löh chatting
with us, providing strategic advice in our patch theory discussions.
Also making an appearance were Bjorn Bringert, Daan Leijen and Ralf/ph
the physicist (hi?).

The good news is that we have made progress and that we have a clearer
picture what is going on... or at least that's how David sees it.  Ian
might beg to differ.  Unfortunately, I did not understand enough of the
discussion to say exactly what we have accomplished, but hopefully,
there will be enough pieces of the puzzle for us to work with.

Topics covered
--------------
Sunday lunch and dinner
 - bug and patch tracking
 - handling Unicode correctly

Wednesday darcs meetup (patch theory 2.0)
 - resolutions are not inverse patches
 - conflicts as branches
 - identical patches should not be conflicts

Sunday lunch and dinner
-----------------------
Jason, David and Eric

Sunday's discussions took place at the Haskell Workshop lunch and at a
restaurant near the ICFP hotel.  There were loads of Haskellers that
night in our group in the restaurant, but not quite within earshot.

1) Bug and patch tracking.

   For issue tracking, we're quite happy with Roundup.  It's easy to
   configure, it plays well with email, its "nosy list" feature allows
   people to control which issues they want to follow.  Something like
   trac tickets tracker is probably not what we want.

   What would be nice is if we had some means of tracking *patches*.
   Perhaps we could change the _darcs/prefs/email so that patches are
   sent to [EMAIL PROTECTED] (Eric would suggest [EMAIL PROTECTED] so that
   our scripts distinguish between the two).  Perhaps a specially
   designed patch tracker written in Haskell would be useful.  Or maybe
   Zachary's adaptation of Patch Watcher will suit our needs.

   To do:
   - Correct the problem where replies to [email protected] (instead
     of [EMAIL PROTECTED]) aren't being noticed by Roundup.  One possible
     solution is for David to edit his procmail recipes so that these
     are autobounced to the right place.
   - Eric: play around with Zachary's patch watcher modifications.

2) Handling Unicode correctly is hard.  I'm sure Juliusz would have
   something to say about this.  Note, this is a mashup of our discussion
   notes, IRC and some emails.  There are three distinct issues to deal
   with: metadata, file contents and filenames.

     i.   Metadata - Right now, the metadata is stored as raw bytes
          in whatever encoding the user provided them.  There seems
          to be a consensus that migrating to UTF-8 metadata would
          be a good thing, one day.  But what makes things difficult
          is figuring out how to make the transition.

          If I understand correctly, the subissues are (a) encoding the
          metadata from the terminal into UTF-8 and back (b) metadata
          from old patches.  Do we trust the user's locale?  Do we have
          the user pass in a flag telling us how to interpret it?

     ii.  File contents - for now, we can recognise text files encoded
          in any 8-bit-friendly format (e.g. ASCII, ISO-8859-1, UTF-8),
          basically because darcs doesn't care about the file contents.

          Darcs does not currently offer support for UTF-16 files.
          Doing so could be tricky.  For example, UTF-16 comes in two
          flavours, a big-endian and a little-endian one, which changes
          how we look for newlines!

          (All UTF-16 files begin with a byte-order-mark that tell you
          which flavour is being used.  I [foolishly] presume that
          nobody actually stores files in UTF-16BE or UTF-16LE.
          Otherwise, even more trickiness!)

    iii.  File names - I don't know what we do with filenames right
          now.  Is it the same situation with file contents?

   Notes - one might think "UTF-8 can represent all the Unicode code
   points, so why bother worrying about UTF-16?"  In a word, Windows.
   The IRC folk tell me that Windows uses UTF-16 for everything...
   Yippee.

   In any case, encoding stuff should be approach with great prudence.
   I would not like to see us seriously eating somebody's data with some
   kind of irreversible encoding mistake.  Also, as David noted, darcs
   is one of those non-trivial applications that has to deal with all
   sorts of crazy stuff that people put into files.

   If you want to work on this, you might consider consulting
   message <[EMAIL PROTECTED]>

Wednesday meet-up
-----------------
Jason, David, Ian, Andres and Eric
(observers: Bjorn, Ralf and hello-from-Daan)

The actual meetup occurred right after they announced the ICFP winners
(Team Smartass won - 2-D is the language of choice).  We went up to the
reception and discussed things over dinner.  It was kinda noisy, so I'll
use that as an excuse for anything I get wrong in this writeup.

3) Elements of the new theory:

   i. Commutation and inverses remain central to patch theory.
       But they won't be the basis for dealing with conflicts.

   ii. Composite patches are not first class objects in darcs.
       Note that this is just the inner theory at work; the user
       interface will probably still call them patches.  Anyway, the
       only patches that darcs will manipulate as such are the
       primitive patches, that is, hunk patches, add/remove
       files/directories.  Names on patches will just be annotations of
       the patches.

   iii. Conflicts are trees.
        Consider the situation where patch A is followed by either B or
        C, which conflict)
           A { B ; C }

        David's idea is that we would introduce a new patch type, known
        as a "resolution patch".  For example, in the above example,
        the user might choose to "cancel" patch C, which we write like
        this:
           A { B; (C) }

        What we want is for this to be effectively equivalent to
           A B

4) Should we treat resolutions as inverses?  No.

   One idea is that a resolution patch could be implemented as an
   inverse patch.  In other words, the situation:
      A { B ; C }
   Resolved as
      A { B ; (C) }
   Is actually the same as
      A C C^ B

   There was quite a bit of discussion about this.  In the end, we
   decided that we could live without this property and that we should
   attend to the core problem.

5) Patches and cancellation.

   If I understand correctly, we're going back to the idea of creating
   a resolution aka cancellation patch.  We're now going to be working
   with two different, but closely related notions: patch cancellation
   vs. patch death.  A patch is canceled because the user says so
   (she creates a resolution patch that cancels it).  Based on this:

     i.   If a patch is not canceled, it is a live patch, or alive.
     ii.  If a patch is depended on by a live patch, it too is alive.
     iii. Otherwise, the patch is dead.

   There was a bunch of discussion on this, most of which I stopped
   being able to follow.  My notes tell me that the work will largely
   have something to do with the function get_common_and_uncommon.

   Also, the Lynagh Counterexample (demonstrating that a patch could be
   both dead and alive) provoked furious scribbling of numerous
   and incomprehensible trees.  Here's the example in case anybody
   wants something to chew on
        { A  B  X
        ; A  C  Y
        ; B' C' Z }

   Finally, I have some other statements noted down, because they
   seemed important.

    - David: if a patch shows up once in a tree, that's all it should
             need to show up

    - David: the branching structure is a derived quantity

    - ???  : what do cancellation patches commute with?

6) Identical patches

   We also discussed the plan for how we would handle identical patches,
   that is, the infamous doppelganger patch problem.  The basic idea is
   that two identical patches with different name should NOT conflict.
   This should make some common behaviours "just work": diff -u style
   patches being applied to two different repos, two people fixing
   whitespace issues, or obvious identical changes.

   Jason asked what would happen if a patch with two instances was
   unpulled.  David believes the change would still be there, but
   just with one name.

   One thing which became clear is that you do not want to implement
   this by just treating two otherwise identical patches with different
   names as being the same patch.  And the reason we want to do it this
   way is so that we can leave the door open for new patch types in the
   future.  A good example to imagine is if one day, we decided to
   introduce an increment-token patch, so that we can say stuff like
   foo++ directly in darcs.  Now what happens if you have TWO patches
   incrementing foo?  Collapsing these into a single patch with two
   names would be changing the semantic behaviour of things.

   Also, some words of caution.

   David pointed out that what might be confusing for users is that
   although two identical patches will not conflict, two non-identical
   patches with similar changes _will_ conflict.

   Finally, I believe this is some stuff that Ian remains unconvinced
   about.  Mainly, what about commutation?  Another concern was that
   patch equality now because a O(n) operation instead of an O(1).
   Is patch equality an important enough operation for this to be a
   cause for concern?

Conclusion
----------
So how did we do?  I'm not too sure.  A recent poll of 4 participants
revealed that:

- 25% (David) report being happy and confident
- 25% (Ian) report _not_ being happy
- 25% (Eric) are still struggling to figure out what happened
- 25% abstained

Well... we still papers to read, discussions to hold, beers to drink.
But we'll get there!

Thanks to Ian and David for comments on my first draft of this summary.

Notation 2006-09-26
-------------------
Here is some notation we might use to discuss this stuff over mail.
Whitespace is meaningless.

A B C
        A sequence of patches, A then B, then C

(A)
        Patch A has been "canceled".
        Perhaps (A B C) should be treated as equivalent to (A) (B)
        (C), but I won't do that unless we agree it's a good idea

A^
        Inverse of patch A (short for LaTeX's ^{-1})

A { B ; C ; D }

        Sequence containing A, and then a conflict between B C and D.
        The basic idea is that a conflict is a tree of patches!  Note
        that you can have complicated the stuff like
        A { B { X ; Y } ; C ; D }

        Note that we can also write this down as
        A { B { X
              ; Y }
          ; C
          ; D }
!!
       Bang! That branch is dead.  For example,
       A { B C D !!
           E F !!
           G }

I have a bunch of trees scribbled down in my notes.  I can write them
down for you if you guys want.  Not sure what order they go in, despite
my attempts at timestamping the sheets.

-- 
Eric Kow                     http://www.loria.fr/~kow
PGP Key ID: 08AC04F9         Merci de corriger mon français.

pgpaYGts9dhEo.pgp
Description: PGP signature

_______________________________________________
darcs-users mailing list
[email protected]
http://www.abridgegame.org/mailman/listinfo/darcs-users

[darcs-users] ICFP meetup notes (2006-09-17 and 2006-09-20)

Reply via email to