On 9/5/2010 15:37, Petr Rockai wrote:
Max Battcher<m...@worldmaker.net>  writes:

045b19  First line of the file
oac77a  second line, tweaked by Lele
fac657  third line, added by John
...

I'm still wondering if the second document may also be better in a YAML format
of some sort... Another toy example:

- 045b19: |
   First line of file
- oac77a: |
   second line, tweaked by Lele
   third line, tweaked by Lele
- fac657: |
   man line, added by John

That doesn't seem that much worse than the existing human-readable annotate
output...

I should remind you that what you propose is not legal YAML, which was
also my reason against using it for annotate in the first place. You
either mangle the file contents doing custom non-YAML quoting, or you
use double-quoted YAML strings with escapes.

The YAML spec explicitly says, that there's no other way to encode
arbitrary strings than to use "..." with \-sequences, which will lead to
really hard to read output (both for humans and non-YAML machines).

Is it hard to read? Surely \-sequences are not much harder to read than existing [_-sequences in darcs. I don't think quote marks subtract much readability either (most of us are programmers, after all-- we see quoted strings all over the place).

Also, what are we talking about escaping? YAML's default character set is UTF-8 and the excluded characters seem quite explicit:

"The allowed character range explicitly excludes the C0 control block #x0-#x1F (except for TAB #x9, LF #xA, and CR #xD which are allowed), DEL #x7F, the C1 control block #x80-#x9F (except for NEL #x85 which is allowed), the surrogate block #xD800-#xDFFF, #xFFFE, and #xFFFF."

Everything in UTF-8 not explicitly excluded is fair game in a literal block... How likely are non-binary files to contain characters in the proscribed ranges? Certainly darcs could check for the proscribed ranges and switch to a quoted format only when truly necessary.

- 045b19:
  "First line of file\u0000"
- oac77a: |
  second line, tweaked by Lele
  third line, tweaked by Lele
- fac657: |
  man line, added by John

I think human-readability is preserved fine. Sure, it makes "scrape parsing" tougher, but of course "use a more conforming parser" if you need to parse more complicated annotate output doesn't sound like bad advice to me. However, it should be obvious that the existence or lack of "|" in the header is a clear, regular determinant even if you do wish to maintain some rough per-line/regex scrape.

--
--Max Battcher--
http://worldmaker.net
_______________________________________________
darcs-users mailing list
darcs-users@darcs.net
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to