RE: xerces-c createdocs.bat and the BOM character

Jesse Pelton Tue, 30 Oct 2007 07:03:48 -0800

Actually, the XML spec discusses the UTF-8 BOM.  See
http://www.w3.org/TR/2006/REC-xml-20060816/#sec-guessing-no-ext-info.

Whether it makes sense is another question.  I suppose it could be used
to quickly distinguish UTF-8 from ASCII and similar encodings.  Since
conforming processors are required to handle UTF-8 and UTF-16, but no
other encodings, this might have some value.

-----Original Message-----
From: Boris Kolpackov [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 30, 2007 10:37 AM
To: [email protected]
Subject: Re: xerces-c createdocs.bat and the BOM character

Hi Justin,

Justin Dearing <[EMAIL PROTECTED]> writes:

> Gerald, the author of XML copy editor seems to think the BOM should be
> there as the docs are UTF-8 and it is a UTF-8 BOM.

BOM (byte order marker) does not make any sense for UTF-8 since it is
a 1-byte encoding.

> 1) What is the intended encoding of the documentation? Since the
documents
> are written in English my understanding is UTF-8 would work just fine
but I
> don't know a lot about unicode.

UTF-8.

> 2) Does the java tool that builds the documentation handle BOMs
correctly
> for UTF-8 or is my editor at fault.

There is no such thing as BOM for UTF-8.

> 3) As a developer working on a windows platform how would I get
encoding
> information about a file?

I assume you are talking about .xml files in the doc/ directory. In this
case: those XML file do not explicitly state their encoding (in XML
declaration) so it defaults to UTF-8.

> 4) As a developer working on a unix platform how would I get encoding
> information about a file?

Ditto.

Boris

-- 
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: xerces-c createdocs.bat and the BOM character

Reply via email to