Michiel Meeuwissen <[EMAIL PROTECTED]> wrote: > I fixed it now too for 'external' included (Includes from another server). > > If the content is XML (<?xml introduction), the encoding is taken from that. > If it is not in that header, it defaults to UTF-8. > > If the content is not XML, the encoding is taken from the Content-Type header > (charset=..). If that cannot be found either, the encoding is supposed to be > ISO-8859-1 (the default for HTML). > > That means that if the Content-Type is text/xml;charset=ISO-8859-1 that that > charset is actually ignored if the body starts with a <?xml header. I think that is good, because > 1. a lot of tomcat version completely unrequestedly append this ISO-8859-1 > thing and you cannot overrid it. > > 2. I think text/xml is actually binary/xml, and no charset can be externally > attributed to it, because IIUC, XML is default UTF-8 and otherwise it is > in the <?xml header, so inside the stream itself. > You are probably right that the ISO-8859-1 encoding should be specified in the text/xml content-type.
By the way both the mime type text/xml and application/xml can be used for xml, see http://www.zvon.org/tmRFC/RFC2376/Output/chapter3.html I found more information in http://www.w3.org/TR/2002/NOTE-xhtml-media-types-20020801/ which says: Authors should also be aware of the difference between 'application/xml' (and for that matter 'application/xhtml+xml' as well) and 'text/xml' with regard to the treatment of character encoding. According to "3.1 Text/xml Registration" of [RFC3023], if a text/xml entity is received with the charset parameter omitted, MIME processors and XML processors MUST use the default charset value of "us-ascii"[ASCII]. This default value is authoritative over the encoding information specified in the XML declaration, or the XML default encodings of UTF-8 and UTF-16 when no encoding declaration is supplied, so omitting the charset parameter of a 'text/xml' entity might cause an unexpected result. As mentioned in [RFC3023], the use of the charset parameter is STRONGLY RECOMMENDED. Martijn Houtman
