Hello,
I have a problem with non-English letters in URLs. I suspect it's a
bug, but I'm not sure where exactly the problem is. So I think it's
best to give a description of what I'm doing:
I want to automatically create text buttons with Batik: First, I
create the graphics in StarOffice and save it as .svg file. The Cocoon
pipeline then reads this, a simple XSLT script exchanges the text in
the .svg file with part of the URL, and then Batik renders it as JPEG.
The corresponding pipeline definition looks like this:
<map:pipeline>
<map:match pattern="xxx/auto-img/*/*.jpg">
<map:generate src="xxx/auto-img/{1}.svg"/>
<map:transform src="xxx/auto-img/auto-img.xsl" type="xslt">
<map:parameter name="text" value="{2}"/>
</map:transform>
<map:serialize type="svg2jpeg"/>
</map:match>
The auto-img.xsl is a dead simple script consisting of the well-known
XSLT copy rule, and one other rules which exchanges the text "REPLACE"
with "{$text}".
Result: The url xxx/auto-img/button/Hello%20World.jpg delivers a fancy
graphical button based on button.svg, saying "Hello World".
I use this from another style sheet which reads elements
<menu href="target.html">description</menu>
and translates them into:
<a href="target.html">
<img src="xxx/auto-img/button/description.jpg"/>
</a>
Result: Nice looking graphical menus with very little effort.
All of this actually works, and took me only 1.5 hours. :-)))
But the problem is... it doesn't work with non-ASCII letters. E.g. if
my text contains German Umlauts (vowels a,o,u with two dots on them),
the resulting button displays two arbitrary characters.
I suspect what goes bad is the URL encoding (i.e. encoding 'special'
characters as %xx escape sequences). I think at some point the string
gets converted into URLs using UTF-8, but elsewhere gets decoded in
some 8-bit character set. Thus, I get two garbage characters where I
expected my Umlaut.
My questions are:
- How does Cocoon encode URLs? As UTF-8, with %xx escapes?
- I would think this to be a common problem. Are there
URL-encoding/decoding methods available in XSLT that I could use to
manually solve the problem?
(I checked the XSLT standard, and it doesn't have this. I also checked
the library of extension functions on the Xalan page.)
- How can I find out where the encoding (or decoding) actually goes
wrong? So far, I can only see the outcome, but I don't know how to do
debugging on this.
Thanks for all answers...
Sincerely,
Daniel
P.S.: I noticed that in my setup Batik has terrible kerning problems:
All characters are of equals width! 'i' and 'l' leave huge gaps, and
'm' overlaps with following letters. Is this a known problem of Batik,
or is maybe something wrong with my environment? (e.g. fonts?)
---------------------------------------------------------------------
Please check that your question has not already been answered in the
FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>
To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail: <[EMAIL PROTECTED]>