David Roundy wrote:
On Fri, Jun 03, 2005 at 03:14:55PM -0400, Max Battcher wrote:
A file can be "canonized" (pretty printed) from a tree structure:
[A]' = Canon<T>(B<T>)
[A] is not necessarily equal to [A]'.
I don't consider this acceptable. It depends what data you're storing, but
the whitespace in my C++ programs is very often important in making it easy
to read, and I'd be upset if recording a patch messed that up.
For machine-generated XML documents, on the other hand, this would probably
be fine. But I don't use those, so they aren't so interesting to me. The
parser I'm interested in would be one that is reversible. In some ways
this is more of a pain than an irreversible parser (since you don't get to
throw information away), but in other ways it may be simpler, since it
means you may be able to construct a smallish set of reversible primitives
from which you construct the parser, and one might be able to determine the
commutation behavior of those primitives in some sense.
That was my intent. The Canon function should be "relatively"
reversible. I was thinking in terms of a pretty printer... it would
store "indent = 2 spaces", "newline = \n\r" and other configuration
options a pretty printer would accept for that language. For most
people in most instances this should be reversible enough (without
wasting large amounts of space), as most people now rely on their
development environment's pretty printer to do this anyway. In those
few cases where they would differ, most of them would probably be
accidental anyway (wrong indent size) or informative (perhaps you didn't
realize that that line was connecting to a deeper nested block).
The developer of the parser would be free to include things in the parse
tree that they felt might be import to people as well... tokens like
"additional newline" or "no newline" between statements. The real key
here is in using a powerful enough parsing framework to allow you to do
such things. Obviously, things like comments are going to be stored as
well. Hopefully balancing robustness with simplicity.
The idea is that the differences between [A]' and [A] should be
acceptable to the owner of the current repo enough that they almost
shouldn't notice a difference, and where they might see a difference it
might seem "better".
--
--Max Battcher--
http://www.worldmaker.net/
The WorldMaker.Network: Support Open/Free Mythoi. Read the manifesto @
mythoi.com
_______________________________________________
darcs-users mailing list
[email protected]
http://www.abridgegame.org/mailman/listinfo/darcs-users