Re: [jira] Commented: (FOR-435) Wiki input files (*.jspwiki) is not correctly read when in UTF-8

Sjur Moshagen Tue, 30 May 2006 23:45:28 -0700

Den 31. mai. 2006 kl. 04.10 skrev David Crossley:

Sjur N. Moshagen (JIRA) wrote:
Sjur N. Moshagen commented on FOR-435:
--------------------------------------
Earlier investigation in our project has shown that the Chaperongrammar is using the default Java file encoding when readingfiles, and that the default Java encoding is given by the OS, inour case MacOS X, which has MacRoman as default. Reading UTF-8encoded files as MacRoman will of course garble non-ASCII characters.
Perhaps this issue also affects the core of Forrest.
IIUC from our sitemaps, we use Chaperon to extract
links from CSS files.


It sounds at least like a potential source of problems.

Today I put some effort into finding a work-around based on thisinsight, and the result is the following command line argument:
forrest run -Dforrest.jvmargs="-Dfile.encoding=utf-8"
Perhaps this should be an available forrest property.

That would be very nice, although it should be made clear in thedocumentation that it can affect more than Chaperon. The parameteroverrides the OS-provided default file encoding, and sets thespecified file encoding as default for the Java VM. Thus, all filereaders not specifying the encoding will use it.

It doesn't really solve the underlying problem of configuringChaperon from within Forrest (or Cocoon), but it does solve ouractual problem through a work-around.
Does the Cocoon chaperon block need some configurability
added?

AFAIR (it is a long time since I tried this), the Chaperondocumentation claims the file reading encoding to be configurable,but I could not get it to work. Whether that was my mistake or a bugin Chaperon is beyond me:-)

Also does our Chaperon jar need updating?

You mentioned an important mail thread below, but could
not provide the link at the time.

The link is provided in the first comment in the issue, just belowthe "empty link" text.

Thanks very much for you investigation and other effort.


Thank you (and all the others) for your work with Forrest!

-David


Sjur

Wiki input files (*.jspwiki) is not correctly read when in UTF-8
----------------------------------------------------------------

         Key: FOR-435
         URL: http://issues.apache.org/jira/browse/FOR-435
     Project: Forrest
        Type: Bug

  Components: Plugin: input.wiki
    Versions: 0.8-dev, 0.7
 Environment: MacOS X, 10.3.8, Java 1.4.2
    Reporter: Sjur N. Moshagen

According to the documentation at:
http://chaperon.sourceforge.net/using-cocoon.html
it should be possible to configure the Wiki plugin (or any pluginbased on Chaperon) for different encodings of the input file, inmy case UTF-8.
But this does not work. I have:
      <map:transformer name="lexer"
src="org.apache.cocoon.transformation.LexicalTransformer"
                             logger="sitemap.transformer.lexer">
              <map:parameter name="localizable" value="true"/>
              <map:parameter name="encoding" value="UTF-8"/>
            </map:transformer>
in the input.xmap file in $FORREST_HOME/plugins/wiki, and I haverun "ant local-deploy", but to no avail: multibyte UTF-8sequences come out as the Latin-1 counterpart of each byte in thesequence.
A discussion about this bug can be found at:
[mail archive not yet updated, will add link here later]

Re: [jira] Commented: (FOR-435) Wiki input files (*.jspwiki) is not correctly read when in UTF-8

Reply via email to