Den 21. mai. 2010 kl. 12.03 skrev Thorsten Scherler: >> The text returned by that Uri is: >> >> <?xml version="1.0" encoding="ISO-8859-1"?><div id="content"><h1>Divvun - >> Sámi proofing tools project</h1><div id="content-main"> >> >> <div class="note"><div class="label">UTF-8 character test</div><div >> class="content"> >> There seems to be problems with certain characters, but only in >> Dispatcher:<br xmlns:xi="http://www.w3.org/2001/XInclude"/> >> a á c č d đ n ŋ s š t ŧ z ž ae æ >> oe ø ao å a¨ ä o¨ ö g ǥ h ħ u ʉ i ɨ >> </div></div> >> >> </div></div> >> >> Two things to note here: >> >> The encoding is specified as ISO-8859-1, which is wrong, > > yes should be utf8. >>
... >> I don't know where the encoding comes from - everything on my end is marked >> as UTF-8. I grepped for the string "ISO-8859-1" in the Forrest sources, and >> got many hits, but nothing that seemed to relate to Dispatcher. > > The *.body.xml comes from the dataModel.xmap: > > <!-- HTML rendered from intermediate format --> > <map:match pattern="**.body.xml"> > <map:generate src="cocoon:/{1}.source.rewritten.xml" /> > <map:transform src="{lm:dataModel-html-document-to-html.xsl}"> > <map:parameter name="path" value="{1}.html" /> > </map:transform> > <map:serialize /> > </map:match> > > The serializer here is the default one. > > we define it in the xmap as > > <map:serializers default="xml" /> > > That should read: > <map:serializers default="xml-utf8" /> > > I added to revision 946939 please see whether that fixes the issue. I added a > test note to > org.apache.forrest.plugin.internal.dispatcher/src/documentation/content/xdocs/index.xml > so you can directly run "forrest run" in the plugin and see the outcome. I did it using my own site (the same document as earlier) - and your change FIXED the bug:) All instances of garbled utf-8 characters are now fixed, both in the body text, and elsewhere. Thanks a lot! Best, Sjur