Actually, the XML spec discusses the UTF-8 BOM. See http://www.w3.org/TR/2006/REC-xml-20060816/#sec-guessing-no-ext-info.
Whether it makes sense is another question. I suppose it could be used to quickly distinguish UTF-8 from ASCII and similar encodings. Since conforming processors are required to handle UTF-8 and UTF-16, but no other encodings, this might have some value. -----Original Message----- From: Boris Kolpackov [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 30, 2007 10:37 AM To: [email protected] Subject: Re: xerces-c createdocs.bat and the BOM character Hi Justin, Justin Dearing <[EMAIL PROTECTED]> writes: > Gerald, the author of XML copy editor seems to think the BOM should be > there as the docs are UTF-8 and it is a UTF-8 BOM. BOM (byte order marker) does not make any sense for UTF-8 since it is a 1-byte encoding. > 1) What is the intended encoding of the documentation? Since the documents > are written in English my understanding is UTF-8 would work just fine but I > don't know a lot about unicode. UTF-8. > 2) Does the java tool that builds the documentation handle BOMs correctly > for UTF-8 or is my editor at fault. There is no such thing as BOM for UTF-8. > 3) As a developer working on a windows platform how would I get encoding > information about a file? I assume you are talking about .xml files in the doc/ directory. In this case: those XML file do not explicitly state their encoding (in XML declaration) so it defaults to UTF-8. > 4) As a developer working on a unix platform how would I get encoding > information about a file? Ditto. Boris -- Boris Kolpackov Code Synthesis Tools CC http://www.codesynthesis.com Open-Source, Cross-Platform C++ XML Data Binding --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
