Hello,

I have a problem with non-English letters in URLs. I suspect it's a
bug, but I'm not sure where exactly the problem is. So I think it's
best to give a description of what I'm doing:

I want to automatically create text buttons with Batik: First, I
create the graphics in StarOffice and save it as .svg file. The Cocoon
pipeline then reads this, a simple XSLT script exchanges the  text in
the .svg file with part of the URL, and then Batik renders it as JPEG.

The corresponding pipeline definition looks like this:
 <map:pipeline>
  <map:match pattern="xxx/auto-img/*/*.jpg">
   <map:generate src="xxx/auto-img/{1}.svg"/>
   <map:transform src="xxx/auto-img/auto-img.xsl" type="xslt">
    <map:parameter name="text" value="{2}"/>
   </map:transform>
   <map:serialize type="svg2jpeg"/>
  </map:match>

The auto-img.xsl is a dead simple script consisting of the well-known
XSLT copy rule, and one other rules which exchanges the text "REPLACE"
with "{$text}".

Result: The url xxx/auto-img/button/Hello%20World.jpg delivers a fancy
graphical button based on button.svg, saying "Hello World".

I use this from another style sheet which reads elements 
  <menu href="target.html">description</menu>
and translates them into:
  <a href="target.html">
   <img src="xxx/auto-img/button/description.jpg"/>
  </a>

Result: Nice looking graphical menus with very little effort.

All of this actually works, and took me only 1.5 hours. :-)))

But the problem is... it doesn't work with non-ASCII letters. E.g. if
my text contains German Umlauts (vowels a,o,u with two dots on them),
the resulting button displays two arbitrary characters. 

I suspect what goes bad is the URL encoding (i.e. encoding 'special'
characters as %xx escape sequences). I think at some point the string
gets converted into URLs using UTF-8, but elsewhere gets decoded in
some 8-bit character set. Thus, I get two garbage characters where I
expected my Umlaut.

My questions are:

- How does Cocoon encode URLs? As UTF-8, with %xx escapes?

- I would think this to be a common problem. Are there
URL-encoding/decoding methods available in XSLT that I could use to
manually solve the problem?

(I checked the XSLT standard, and it doesn't have this. I also checked
the library of extension functions on the Xalan page.)

- How can I find out where the encoding (or decoding) actually goes
wrong? So far, I can only see the outcome, but I don't know how to do
debugging on this.

Thanks for all answers... 

Sincerely,
Daniel


P.S.: I noticed that in my setup Batik has terrible kerning problems:
All characters are of equals width! 'i' and 'l' leave huge gaps, and
'm' overlaps with following letters. Is this a known problem of Batik,
or is maybe something wrong with my environment? (e.g. fonts?)


---------------------------------------------------------------------
Please check that your question has not already been answered in the
FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail: <[EMAIL PROTECTED]>

Reply via email to