Paul Schauble writes: > That last is exactly correct. The Byte Order Mark also identifies the > file as Unicode. It provides a unique signature for UTF-8, UTF-16LE, and > UTF-16BE files.
If people would call the character by its name -- "ZERO WIDTH NO-BREAK SPACE" -- there would be less confusion. It's a character in its own right, and can occur anywhere in any Unicode file. It happens that the semantics are perfect for a signature (including BOM), and that the constituent byte sequences in all UTFs are quite rare in any natural text in other encodings. Pedantic? Of course! I think it's worth being pedantic about Unicode at this stage; the risk of backward-incompatible changes in repository format due to mistaken implementation of Unicode is too high to be ignored. _______________________________________________ darcs-devel mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-devel
