> From: Daniel Vogelheim [mailto:[EMAIL PROTECTED]]
> 
> Hello,
> 
> I have a problem with non-English letters in URLs. I suspect it's a
> bug, but I'm not sure where exactly the problem is. So I think it's
> best to give a description of what I'm doing:
> 
> I want to automatically create text buttons with Batik: First, I
> create the graphics in StarOffice and save it as .svg file. The Cocoon
> pipeline then reads this, a simple XSLT script exchanges the  text in
> the .svg file with part of the URL, and then Batik renders it as JPEG.
> 
> The corresponding pipeline definition looks like this:
>  <map:pipeline>
>   <map:match pattern="xxx/auto-img/*/*.jpg">
>    <map:generate src="xxx/auto-img/{1}.svg"/>
>    <map:transform src="xxx/auto-img/auto-img.xsl" type="xslt">
>     <map:parameter name="text" value="{2}"/>
>    </map:transform>
>    <map:serialize type="svg2jpeg"/>
>   </map:match>
> 
> The auto-img.xsl is a dead simple script consisting of the well-known
> XSLT copy rule, and one other rules which exchanges the text "REPLACE"
> with "{$text}".
> 
> Result: The url xxx/auto-img/button/Hello%20World.jpg delivers a fancy
> graphical button based on button.svg, saying "Hello World".
> 
> I use this from another style sheet which reads elements
>   <menu href="target.html">description</menu>
> and translates them into:
>   <a href="target.html">
>    <img src="xxx/auto-img/button/description.jpg"/>
>   </a>
> 
> Result: Nice looking graphical menus with very little effort.
> 
> All of this actually works, and took me only 1.5 hours. :-)))
> 
> But the problem is... it doesn't work with non-ASCII letters. E.g. if
> my text contains German Umlauts (vowels a,o,u with two dots on them),
> the resulting button displays two arbitrary characters.
> 
> I suspect what goes bad is the URL encoding (i.e. encoding 'special'
> characters as %xx escape sequences). I think at some point the string
> gets converted into URLs using UTF-8, but elsewhere gets decoded in
> some 8-bit character set. Thus, I get two garbage characters where I
> expected my Umlaut.
> 
> My questions are:
> 
> - How does Cocoon encode URLs? As UTF-8, with %xx escapes?

In what place? In HTML you wrote above:

>    <img src="xxx/auto-img/button/description.jpg"/>

? Cocoon encodes here all the text to the encoding you specify for
serialzer. Then this is read and interpreted by browser, which in turn
encodes this into something when sending HTTP request to get a picture.
This request then is processed and decoded by servlet engine.


> - I would think this to be a common problem. Are there
> URL-encoding/decoding methods available in XSLT that I could use to
> manually solve the problem?

I would suggest create URLs without national characters as this (AFAIU)
will require testing on all browsers under different OS and region
settings to just make sure that browser/os/region combo behaves as
expected.

One way is to issue URLs like xxx/auto-img/button/<number>, and have
<number>-to-text mapping somewhere (say, session - like fragment
extractor does).

See also: http://www.w3.org/Addressing/rfc1738.txt, "2.2. URL Character
Encoding Issues". They talk only about US-ASCII character set.


> (I checked the XSLT standard, and it doesn't have this. I also checked
> the library of extension functions on the Xalan page.)
> 
> - How can I find out where the encoding (or decoding) actually goes
> wrong? So far, I can only see the outcome, but I don't know how to do
> debugging on this.

You can use Catalina's <Valve
className="org.apache.catalina.valves.RequestDumperValve"/> to see
what's going on (see tomcat/config/server.xml).

Vadim
 
> Thanks for all answers...
> 
> Sincerely,
> Daniel
> 
> 
> P.S.: I noticed that in my setup Batik has terrible kerning problems:
> All characters are of equals width! 'i' and 'l' leave huge gaps, and
> 'm' overlaps with following letters. Is this a known problem of Batik,
> or is maybe something wrong with my environment? (e.g. fonts?)


---------------------------------------------------------------------
Please check that your question has not already been answered in the
FAQ before posting. <http://xml.apache.org/cocoon/faqs.html>

To unsubscribe, e-mail: <[EMAIL PROTECTED]>
For additional commands, e-mail: <[EMAIL PROTECTED]>

Reply via email to