On Sat, Dec 26, 2009 at 14:17, Sergiu Dumitriu <[email protected]> wrote:
> Hi devs,
>
> The short version:
>
> Should we always use UTF-8 for encoding and decoding URLs, regardless of
> the wiki encoding, for better compliance with web standards?
>
>
> The long version:
>
> By definition, URLs can only contain ASCII characters, everything else
> must be converted to their corresponding bytes and escaped as %XY
> escapes. The problem is that "their corresponding bytes" implies a
> charset + encoding, and no specification *enforces* a specific pair,
> although it is *recommended* to use Unicode + UTF8, to comply with the
> modern tendency of the web in general.
>
> Traditionally, XWiki has been using the same encoding as the configured
> global wiki encoding for the URLs, which means that before 1.9, when we
> switched to UTF8 as the default wiki encoding, all URLs were using the
> ISO-8859-1 encoding. Since the switch to UTF-8, URLs are also using the
> UTF-8 encoding by default, although the wiki encoding can be changed.
>
> Now, since 2.1, a bugfix accidentally changed the behavior, so that
> parsing back URLs always uses the UTF-8 encoding, even though composing
> URLs continues to use the wiki encoding. This is a bug, which prevents
> changing the encoding to anything other than UTF-8, and it should be fixed.
>
> Now, we have two options:
>
> 1. Reintroduce the old behavior, so that URLs always use the wiki
> encoding. This is a direct bugfix.
> 2. Also change the encoding part, so that UTF-8 is always used. This is
> an improvement, going towards better compliance with web standards.
>
> Personally I think that the second option is the better one, but it
> requires a vote, since it has a few drawbacks.
>
> Advantages:
> + better compliance with web standards, since UTF-8 is the recommended
> encoding for URLs (although not imposed)
> + support for a wider range of document names, since UTF-8 allows
> full-unicode document names, while ISO-8859-1 limits names to latin1
> characters
> + better support from browsers, since entering accented characters
> directly in the address bar encodes the URL sent to the server using
> UTF-8, and decoding the URL also assumes UTF-8; this means that a
> document named "é" will be printed as .../view/Main/%E9 and will have to
> be entered the same way in the address bar when ISO-8859-1 is used, and
> as .../view/Main/é when UTF-8 is used
>
> Drawbacks:
> - by default Tomcat uses ISO-8859-1 as the encoding for URLs, so the
> Tomcat configuration will have to be changed as in
> http://platform.xwiki.org/xwiki/bin/view/AdminGuide/Encoding#HTomcat
> - some existing bookmarks will not work anymore once the encoding is changed
>
> +1 for option 2 from me,

+1 for 2

> --
> Sergiu Dumitriu
> http://purl.org/net/sergiu/
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs
>



-- 
Thomas Mortagne
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to