[darcs-users] Parsing Patches (was: Where Arch is going)

Max Battcher Fri, 03 Jun 2005 12:16:40 -0700

David Roundy wrote:

Another interesting problem
would be that of creating a sort of lexing/parsing language that would
allow customized patch types that are specific to a particular programming
language.  This is a particularly hard problem, as you'd need to have the
parsing always succeed and always give meaningful (and reasonable) results.
And the resulting patches would have to merge and commute in a meaningful
and useful manner.

I understand why you think it is important to create meaningful patchesfor non-parsing code for those instances when you want to save unstableworks in progress.

But, I think there might be a good case for restricting such a patchtype to parsing documents (and using hunks if it won't parse). Properparsing documents make much better semantic sense (and thus bettercherry-pickable patches).

I think the big trick here is in commuting parse patches with hunkpatches. Let me see if I can explain how I'm thinking this might work...


Assuming, for simplification, that all patches only affect a single file.

A tree structure is the result of parsing a well-formed file state [A]:

B<T> = Parse<T>[A]

A file can be "canonized" (pretty printed) from a tree structure:

[A]' = Canon<T>(B<T>)

[A] is not necessarily equal to [A]'. However, for a given repo thepretty printed [A]' can be considered definitive (it is what the repoowner really "wants"), and we can ignore/forget whatever [A] was.What this means is that for the chosen repo, whenever B<T> can beformed, [A]' should be able to replace [A] without changing the meaningor irritating the user. Who would complain if every time you recorded aparseable file Darcs pretty printed it back? You will need to canonizetree patches this way anyway to create files from the empty tree(assuming you start entirely from tree patches).

You can create a hunk patch in reference to [A]'. The problem is thenin the commutation. A hunk patch commuted with a tree patch wouldresult in two hunk patches.


[D] = B<T> C <-> C' B'

At first glance, this seems useless, because if you commute a hunk patchall the way down, you now have hunk representations of all of the treepatches with respect to the canon. [D] may not be parseable, but [E]might be.


[E]' = C' B' F (F would have to be a hunk patch)

This means, that C' B' F would be equivalent to, and thus could bemerged into Parse<T>[E]'. In this case, B<T> could then be triviallycommuted back out of the merge:


{C F} = Parse<T>[E]' - B<T> (assuming - is the tree diff operation)

What this means is that two or more hunk patches could becomposed/merged into a tree patch. Most likely you are going to moreinteresting in {C F} than in one or the other alone, anyway. Any stablerepository with only tree patches (and composed tree patches) couldreturn to sender any single hunk patches for being unstable without evenhaving to build and test the repository, as it would be inherentlyunparsable, malformed, and thus uncompilable. These composed treepatches thus better model a developer's workflow, as there might be manyintermediate patches from each sitting that result in one decent,parseable state.

The next and final diffulty is then in determining if some tree type Tand some other U commute. Although I talk about them normal hunkpatches, you would have to encode T in the hunk patches so that you cancommute them with U. Where this would be used is for things like indentpreference that would affect the canonization, but would still becommutative if you translate between the canonical forms. This wouldonly be necessary for unstable repositories where multiple alonetree-hunks would be stored of different canonization types. Oncecomposed with other tree-hunks into a tree patch, the canonization"details" don't matter again.

By the way: Pappy the packrat parser[1] (Haskell code) and its PEGsmight be one of the more interesting choices, as opposed to BNF, forbuilding these parse patches, particularly because it might offer a wayto also create canonizers from the same PEG (which would be very tough,if not impossible, from a BNF). I don't think anyone's ever programmedsuch a thing, but I'm thinking it might be an interesting experiment.


[1] http://pdos.csail.mit.edu/~baford/packrat/

I hope that that helps to get some of the idea across. I'm not sureI've explained things that well, or even if I am thinking in the rightdirection. Hopefully, though, it might spark ideas or experiments forsomeone else to try.


--
--Max Battcher--
http://www.worldmaker.net/

The WorldMaker.Network: Support Open/Free Mythoi. Read the manifesto @mythoi.com


_______________________________________________
darcs-users mailing list
[email protected]
http://www.abridgegame.org/mailman/listinfo/darcs-users

[darcs-users] Parsing Patches (was: Where Arch is going)

Reply via email to