On 9/5/2010 15:37, Petr Rockai wrote:
Max Battcher<m...@worldmaker.net> writes:
045b19 First line of the file
oac77a second line, tweaked by Lele
fac657 third line, added by John
...
I'm still wondering if the second document may also be better in a YAML format
of some sort... Another toy example:
- 045b19: |
First line of file
- oac77a: |
second line, tweaked by Lele
third line, tweaked by Lele
- fac657: |
man line, added by John
That doesn't seem that much worse than the existing human-readable annotate
output...
I should remind you that what you propose is not legal YAML, which was
also my reason against using it for annotate in the first place. You
either mangle the file contents doing custom non-YAML quoting, or you
use double-quoted YAML strings with escapes.
The YAML spec explicitly says, that there's no other way to encode
arbitrary strings than to use "..." with \-sequences, which will lead to
really hard to read output (both for humans and non-YAML machines).
Is it hard to read? Surely \-sequences are not much harder to read than
existing [_-sequences in darcs. I don't think quote marks subtract much
readability either (most of us are programmers, after all-- we see
quoted strings all over the place).
Also, what are we talking about escaping? YAML's default character set
is UTF-8 and the excluded characters seem quite explicit:
"The allowed character range explicitly excludes the C0 control block
#x0-#x1F (except for TAB #x9, LF #xA, and CR #xD which are allowed), DEL
#x7F, the C1 control block #x80-#x9F (except for NEL #x85 which is
allowed), the surrogate block #xD800-#xDFFF, #xFFFE, and #xFFFF."
Everything in UTF-8 not explicitly excluded is fair game in a literal
block... How likely are non-binary files to contain characters in the
proscribed ranges? Certainly darcs could check for the proscribed ranges
and switch to a quoted format only when truly necessary.
- 045b19:
"First line of file\u0000"
- oac77a: |
second line, tweaked by Lele
third line, tweaked by Lele
- fac657: |
man line, added by John
I think human-readability is preserved fine. Sure, it makes "scrape
parsing" tougher, but of course "use a more conforming parser" if you
need to parse more complicated annotate output doesn't sound like bad
advice to me. However, it should be obvious that the existence or lack
of "|" in the header is a clear, regular determinant even if you do wish
to maintain some rough per-line/regex scrape.
--
--Max Battcher--
http://worldmaker.net
_______________________________________________
darcs-users mailing list
darcs-users@darcs.net
http://lists.osuosl.org/mailman/listinfo/darcs-users