Michiel Meeuwissen <[EMAIL PROTECTED]> wrote:

> I fixed it now too for 'external' included (Includes from another server).
>
> If the content is XML (<?xml introduction), the encoding is taken from
that.
> If it is not in that header, it defaults to UTF-8.
>
> If the content is not XML, the encoding is taken from the Content-Type
header
> (charset=..). If that cannot be found either, the encoding is supposed to
be
> ISO-8859-1 (the default for HTML).
>
> That means that if the Content-Type is text/xml;charset=ISO-8859-1 that
that
> charset is actually ignored if the body starts with a <?xml header. I
think that is good, because
> 1. a lot of tomcat version completely unrequestedly append this ISO-8859-1
>    thing and you cannot overrid it.
>
> 2. I think text/xml is actually binary/xml, and no charset can be
externally
>    attributed to it, because IIUC, XML is default UTF-8 and otherwise it
is
>    in the <?xml header, so inside the stream itself.
>
You are probably right that the ISO-8859-1 encoding should be specified in
the text/xml content-type.

By the way both the mime type text/xml and application/xml can be used for
xml, see

http://www.zvon.org/tmRFC/RFC2376/Output/chapter3.html

I found more information in

http://www.w3.org/TR/2002/NOTE-xhtml-media-types-20020801/

which says:

Authors should also be aware of the difference between 'application/xml'
(and for that matter 'application/xhtml+xml' as well) and 'text/xml' with
regard to the treatment of character encoding. According to "3.1 Text/xml
Registration" of [RFC3023], if a text/xml entity is received with the
charset parameter omitted, MIME processors and XML processors MUST use the
default charset value of "us-ascii"[ASCII]. This default value is
authoritative over the encoding information specified in the XML
declaration, or the XML default encodings of UTF-8 and UTF-16 when no
encoding declaration is supplied, so omitting the charset parameter of a
'text/xml' entity might cause an unexpected result. As mentioned in
[RFC3023], the use of the charset parameter is STRONGLY RECOMMENDED.

Martijn Houtman


Reply via email to