David Roundy wrote:
On Fri, Jun 03, 2005 at 03:14:55PM -0400, Max Battcher wrote:
A file can be "canonized" (pretty printed) from a tree structure:

[A]' = Canon<T>(B<T>)

[A] is not necessarily equal to [A]'.


I don't consider this acceptable.  It depends what data you're storing, but
the whitespace in my C++ programs is very often important in making it easy
to read, and I'd be upset if recording a patch messed that up.

For machine-generated XML documents, on the other hand, this would probably
be fine.  But I don't use those, so they aren't so interesting to me.  The
parser I'm interested in would be one that is reversible.  In some ways
this is more of a pain than an irreversible parser (since you don't get to
throw information away), but in other ways it may be simpler, since it
means you may be able to construct a smallish set of reversible primitives
from which you construct the parser, and one might be able to determine the
commutation behavior of those primitives in some sense.

That was my intent. The Canon function should be "relatively" reversible. I was thinking in terms of a pretty printer... it would store "indent = 2 spaces", "newline = \n\r" and other configuration options a pretty printer would accept for that language. For most people in most instances this should be reversible enough (without wasting large amounts of space), as most people now rely on their development environment's pretty printer to do this anyway. In those few cases where they would differ, most of them would probably be accidental anyway (wrong indent size) or informative (perhaps you didn't realize that that line was connecting to a deeper nested block).

The developer of the parser would be free to include things in the parse tree that they felt might be import to people as well... tokens like "additional newline" or "no newline" between statements. The real key here is in using a powerful enough parsing framework to allow you to do such things. Obviously, things like comments are going to be stored as well. Hopefully balancing robustness with simplicity.

The idea is that the differences between [A]' and [A] should be acceptable to the owner of the current repo enough that they almost shouldn't notice a difference, and where they might see a difference it might seem "better".

--
--Max Battcher--
http://www.worldmaker.net/
The WorldMaker.Network: Support Open/Free Mythoi. Read the manifesto @ mythoi.com

_______________________________________________
darcs-users mailing list
[email protected]
http://www.abridgegame.org/mailman/listinfo/darcs-users

Reply via email to