Thank you all for you quick responses. On 10/30/07, Jesse Pelton <[EMAIL PROTECTED]> wrote: > Actually, the XML spec discusses the UTF-8 BOM. See > http://www.w3.org/TR/2006/REC-xml-20060816/#sec-guessing-no-ext-info. > > Whether it makes sense is another question. I suppose it could be used > to quickly distinguish UTF-8 from ASCII and similar encodings. Since > conforming processors are required to handle UTF-8 and UTF-16, but no > other encodings, this might have some value. >
It seems to me, and I'm not an expert on unicode at all, that it would make sense for the doc building tools to handle the UTF-8 BOM character if it sometimes gets inserted there. If for the sake of consistency you want to explicitly forbid UTF-8 BOMs in the documentation source, but want to make the error thrown more clear you could do the following. The current exception that gets thrown should be caught, and become the InnerException of an exception with the message of "This document has a UTF-8 BOM character. Xerces Documentation source files should not contain a UTF-8 BOM character." I now know to change a checkbox on XML copy editor to prevent the issue from affecting me. I will talk to Gerald about default behavior of XML Copy Editor. However, GVIM, handles the file with or without the BOM and I couldn't figure out what the problem was until I generated a diff which rendered the BOM as a 2 printable characters in the text file since it was no longer the first byte sequence in the file. TortoiseMerge, and WinMerge both indicate the first line is changed but do not illustrate the BOM character, which is a bug on their part so I will report that to their respective maintainers. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
