Re: xerces-c createdocs.bat and the BOM character

Justin Dearing Tue, 30 Oct 2007 07:36:05 -0800

Thank you all for you quick responses.

On 10/30/07, Jesse Pelton <[EMAIL PROTECTED]> wrote:
> Actually, the XML spec discusses the UTF-8 BOM.  See
> http://www.w3.org/TR/2006/REC-xml-20060816/#sec-guessing-no-ext-info.
>
> Whether it makes sense is another question.  I suppose it could be used
> to quickly distinguish UTF-8 from ASCII and similar encodings.  Since
> conforming processors are required to handle UTF-8 and UTF-16, but no
> other encodings, this might have some value.
>


It seems to me, and I'm not an expert on unicode at all, that it would
make sense for the doc building tools to handle the UTF-8 BOM
character if it sometimes gets inserted there. If for the sake of
consistency you want to explicitly forbid UTF-8 BOMs in the
documentation source, but want to make the error thrown more clear you
could do the following. The current exception that gets thrown should
be caught, and become the InnerException of an exception with the
message of "This document has a UTF-8 BOM character. Xerces
Documentation source files should not contain a UTF-8 BOM character."

I now know to change a checkbox on XML copy editor to prevent the
issue from affecting me. I will talk to Gerald about default behavior
of XML Copy Editor. However, GVIM, handles the file with or without
the BOM and I couldn't figure out what the problem was until I
generated a diff which rendered the BOM as a 2 printable characters in
the text file since it was no longer the first byte sequence in the
file. TortoiseMerge, and WinMerge both indicate the first line is
changed but do not illustrate the BOM character, which is a bug on
their part so I will report that to their respective maintainers.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: xerces-c createdocs.bat and the BOM character

Reply via email to