Re: Latin1 character problems in dispatcher

Sjur Moshagen Fri, 21 May 2010 03:06:36 -0700

Den 21. mai. 2010 kl. 12.03 skrev Thorsten Scherler:

>> The text returned by that Uri is:
>> 
>> <?xml version="1.0" encoding="ISO-8859-1"?><div id="content"><h1>Divvun - 
>> Sámi proofing tools project</h1><div id="content-main">
>> 
>>        <div class="note"><div class="label">UTF-8 character test</div><div 
>> class="content">
>>              There seems to be problems with certain characters, but only in
>>              Dispatcher:<br xmlns:xi="http://www.w3.org/2001/XInclude"/>
>>              a á c &#269; d &#273; n &#331; s &#353; t &#359; z &#382; ae æ 
>> oe ø ao å a¨ ä o¨ ö g &#485; h &#295; u &#649; i &#616;
>>        </div></div>
>> 
>> </div></div>
>> 
>> Two things to note here:
>> 
>> The encoding is specified as ISO-8859-1, which is wrong,
> 
> yes should be utf8.
>>


...

>> I don't know where the encoding comes from - everything on my end is marked 
>> as UTF-8. I grepped for the string "ISO-8859-1" in the Forrest sources, and 
>> got many hits, but nothing that seemed to relate to Dispatcher.
> 
> The *.body.xml comes from the dataModel.xmap:
> 
> <!-- HTML rendered from intermediate format -->
>      <map:match pattern="**.body.xml">
>        <map:generate src="cocoon:/{1}.source.rewritten.xml" />
>        <map:transform src="{lm:dataModel-html-document-to-html.xsl}">
>          <map:parameter name="path" value="{1}.html" />
>        </map:transform>
>        <map:serialize />
>      </map:match>
> 
> The serializer here is the default one.
> 
> we define it in the xmap as
> 
> <map:serializers default="xml" />
> 
> That should read:
> <map:serializers default="xml-utf8" />
> 
> I added to revision 946939 please see whether that fixes the issue. I added a 
> test note to 
> org.apache.forrest.plugin.internal.dispatcher/src/documentation/content/xdocs/index.xml
>  so you can directly run "forrest run"  in the plugin and see the outcome.

I did it using my own site (the same document as earlier) - and your change 
FIXED the bug:)

All instances of garbled utf-8 characters are now fixed, both in the body text, 
and elsewhere.

Thanks a lot!

Best,
Sjur

Re: Latin1 character problems in dispatcher

Reply via email to