Michiel Meeuwissen <[EMAIL PROTECTED]> wrote:
> Martijn Houtman <[EMAIL PROTECTED]> wrote:
> > I did not use your new code yet, because I thought it was easier to wait for
> > the binary version. I don't expect troubles with the new code, but I
> > understand that you are curious about the result of your work. With the old
> > code everything works for ISO-8859-1 if I remove the ISO-8859-1 encoding
> > from the xml header.
> 
> For external includes it depends now completely on what the external server
> report the encoding to be. I suppose that the encoding is not reported for
> XML files (because XML files are binary..).
> 
> I don't understand really how removing the header could make a differnce for
> an external XML, because it should be presentif the encodign is like that,
> and I can't understand that the external server succeeds serving it like
> UTF-8 when the file is actually ISO-8859-1. But anyhow, I'll make sure that
> it works in my setup..

I fixed it now too for 'external' included (Includes from another server).

If the content is XML (<?xml introduction), the encoding is taken from that.
If it is not in that header, it defaults to UTF-8.

If the content is not XML, the encoding is taken from the Content-Type header
(charset=..). If that cannot be found either, the encoding is supposed to be
ISO-8859-1 (the default for HTML).

That means that if the Content-Type is text/xml;charset=ISO-8859-1 that that
charset is actually ignored if the body starts with a <?xml header. I think that is 
good, because 
1. a lot of tomcat version completely unrequestedly append this ISO-8859-1
   thing and you cannot overrid it. 

2. I think text/xml is actually binary/xml, and no charset can be externally
   attributed to it, because IIUC, XML is default UTF-8 and otherwise it is
   in the <?xml header, so inside the stream itself.


Michiel

-- 
Michiel Meeuwissen
Mediacentrum 140 H'sum 
+31 (0)35 6772979
nl_NL eo_XX en_US
mihxil'
 [] ()

Reply via email to