hiya,

On Tuesday 22 July 2008 03:32:11 am Joey Hess wrote:
> sean finney wrote:
> > if you add a file with utf-8 characters in its name, the resulting <a
> > href... links in the rendered pages have illegal characters in the name.
>
> What specific problems are you seeing with utf-8 characters in urls? I
> rarely read RFCs for fun anymore, but I was under the perhaps mistaken
> impression that they were fine unescaped outside of domain names now.

i'm playing around with a local ikiwiki install on my laptop, and get the 
following warning in my html validation:

line 311 column 1 - Warning: <a> escaping malformed URI reference

and the text explanation:

 “<…> escaping malformed URI reference”
Cause:

A URI contains impermissible characters or quotes around the URI are not 
closed.
Example:
Good    <a href="http://www.mozilla.org/one space.html">space</a>
Good    <a href="http://www.mozilla.org/one%20space.html";>space</a>
Good    <a href="http://www.mozilla.org/one+space.html";>space</a>
Good    <a href="http://www.w3.org/>W3C</a>
Good    <a href="http://www.w3.org/";>W3C</a>
Good    <a href="mailto:[EMAIL PROTECTED] space">Email me!</a>
Good    <a href="mailto:[EMAIL PROTECTED]">Email me!</a>

A space should not be contained in a URI (even if it works in all browsers…). 
This is detailed in RFC1738; look for the word “unsafe”.
Solution:

    * If the URI contains impermissible characters, replace the characters 
with permissible ones or encode them using hexadecimal format. In the case of 
URIs, hexadecimal format uses a percentage character followed by a 
combination of two letters and digits (a-f and 0-9). The notation for the 
space character is %20.
    * If the URI is missing a quotation mark delimiter, add the character. 

References:

    * RFC2396 - Uniform Resource Identifiers (URI): Generic Syntax"
    * RFC1738 - Uniform Resource Locators
    * W3Schools: Hexadecimal Format Reference 

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to