Paul Schauble writes:

 > That last is exactly correct. The Byte Order Mark also identifies the
 > file as Unicode. It provides a unique signature for UTF-8, UTF-16LE, and
 > UTF-16BE files.

If people would call the character by its name -- "ZERO WIDTH NO-BREAK
SPACE" -- there would be less confusion.  It's a character in its own
right, and can occur anywhere in any Unicode file.  It happens that
the semantics are perfect for a signature (including BOM), and that
the constituent byte sequences in all UTFs are quite rare in any
natural text in other encodings.

Pedantic?  Of course!  I think it's worth being pedantic about Unicode
at this stage; the risk of backward-incompatible changes in repository
format due to mistaken implementation of Unicode is too high to be
ignored.
_______________________________________________
darcs-devel mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-devel

Reply via email to