The offending char is contained in the welcome.xslt stylesheet that is encoded as ISO-8859-1.
The pipeline does
- welcome.xml -> ISO-8859-1 - welcome.xslt -> ISO-8859-1 - xhtml serializer -> UTF-8
the results are indeed encoded using UTF-8, thus the copyright sign ends up being 16 bits (UTF-8 is a clever mixing of 8bit and 16bit char encoding that was done for easy back compatibility and compression since most text is on the lower 8bit spectrum nowadays, UTF-16 is more even in that respect, but nobody uses it because text is normally half as big)
On MacOSX, the results are interesting:
- mozilla 1.3b (20030212) displays the correct encoding - safari 1.0b(v60) doesn't - camino 0.7 (2003030613) displays the correct encoding - IE 5.2.2 (5010.1) doesn't
I traced the problem down to the fact that, apparently, both IE and Safari are *NOT* able to understand the encoding from the starting XML PI.
On the other hand, by placing
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
the server creates an HTTP header that instructs the user-agent about the encoding. This solved the encoding problem on *all* browsers.
Results:
1) this is *NOT* a cocoon issue
2) be aware of the fact that some user-agents do not parse the XML PI to get the encoding, but only the HTTP headers.
NOTES:
1) there is no clear indication on the XHTML specification about how user-agents have to guess the encoding
2) there is no indication on what Mime-type the XHTML content should have.
These problems reflect the lack of direct collaboration between the IETF and W3C on XML/HTTP relationship. Unfortunately, this is only going to get worse. So be prepared, expecially for severely internationalized content.
Stefano.