Sjur N. Moshagen (JIRA) wrote:
> 
> Sjur N. Moshagen commented on FOR-435:
> --------------------------------------
> 
> Earlier investigation in our project has shown that the Chaperon grammar is 
> using the default Java file encoding when reading files, and that the default 
> Java encoding is given by the OS, in our case MacOS X, which has MacRoman as 
> default. Reading UTF-8 encoded files as MacRoman will of course garble 
> non-ASCII characters.

Perhaps this issue also affects the core of Forrest.
IIUC from our sitemaps, we use Chaperon to extract
links from CSS files.

> Today I put some effort into finding a work-around based on this insight, and 
> the result is the following command line argument:
> 
> forrest run -Dforrest.jvmargs="-Dfile.encoding=utf-8"

Perhaps this should be an available forrest property.

> It doesn't really solve the underlying problem of configuring Chaperon from 
> within Forrest (or Cocoon), but it does solve our actual problem through a 
> work-around.

Does the Cocoon chaperon block need some configurability
added? Also does our Chaperon jar need updating?

You mentioned an important mail thread below, but could
not provide the link at the time.

Thanks very much for you investigation and other effort.

-David

> > Wiki input files (*.jspwiki) is not correctly read when in UTF-8
> > ----------------------------------------------------------------
> >
> >          Key: FOR-435
> >          URL: http://issues.apache.org/jira/browse/FOR-435
> >      Project: Forrest
> >         Type: Bug
> 
> >   Components: Plugin: input.wiki
> >     Versions: 0.8-dev, 0.7
> >  Environment: MacOS X, 10.3.8, Java 1.4.2
> >     Reporter: Sjur N. Moshagen
> 
> >
> > According to the documentation at:
> > http://chaperon.sourceforge.net/using-cocoon.html
> > it should be possible to configure the Wiki plugin (or any plugin based on 
> > Chaperon) for different encodings of the input file, in my case UTF-8.
> > But this does not work. I have:
> >       <map:transformer name="lexer" 
> >                              
> > src="org.apache.cocoon.transformation.LexicalTransformer" 
> >                              logger="sitemap.transformer.lexer">
> >               <map:parameter name="localizable" value="true"/>
> >               <map:parameter name="encoding" value="UTF-8"/>
> >             </map:transformer>
> > in the input.xmap file in $FORREST_HOME/plugins/wiki, and I have run "ant 
> > local-deploy", but to no avail: multibyte UTF-8 sequences come out as the 
> > Latin-1 counterpart of each byte in the sequence.
> > A discussion about this bug can be found at:
> > [mail archive not yet updated, will add link here later]