On Sat, Dec 26, 2009 at 14:17, Sergiu Dumitriu <[email protected]> wrote: > Hi devs, > > The short version: > > Should we always use UTF-8 for encoding and decoding URLs, regardless of > the wiki encoding, for better compliance with web standards? > > > The long version: > > By definition, URLs can only contain ASCII characters, everything else > must be converted to their corresponding bytes and escaped as %XY > escapes. The problem is that "their corresponding bytes" implies a > charset + encoding, and no specification *enforces* a specific pair, > although it is *recommended* to use Unicode + UTF8, to comply with the > modern tendency of the web in general. > > Traditionally, XWiki has been using the same encoding as the configured > global wiki encoding for the URLs, which means that before 1.9, when we > switched to UTF8 as the default wiki encoding, all URLs were using the > ISO-8859-1 encoding. Since the switch to UTF-8, URLs are also using the > UTF-8 encoding by default, although the wiki encoding can be changed. > > Now, since 2.1, a bugfix accidentally changed the behavior, so that > parsing back URLs always uses the UTF-8 encoding, even though composing > URLs continues to use the wiki encoding. This is a bug, which prevents > changing the encoding to anything other than UTF-8, and it should be fixed. > > Now, we have two options: > > 1. Reintroduce the old behavior, so that URLs always use the wiki > encoding. This is a direct bugfix. > 2. Also change the encoding part, so that UTF-8 is always used. This is > an improvement, going towards better compliance with web standards. > > Personally I think that the second option is the better one, but it > requires a vote, since it has a few drawbacks. > > Advantages: > + better compliance with web standards, since UTF-8 is the recommended > encoding for URLs (although not imposed) > + support for a wider range of document names, since UTF-8 allows > full-unicode document names, while ISO-8859-1 limits names to latin1 > characters > + better support from browsers, since entering accented characters > directly in the address bar encodes the URL sent to the server using > UTF-8, and decoding the URL also assumes UTF-8; this means that a > document named "é" will be printed as .../view/Main/%E9 and will have to > be entered the same way in the address bar when ISO-8859-1 is used, and > as .../view/Main/é when UTF-8 is used > > Drawbacks: > - by default Tomcat uses ISO-8859-1 as the encoding for URLs, so the > Tomcat configuration will have to be changed as in > http://platform.xwiki.org/xwiki/bin/view/AdminGuide/Encoding#HTomcat > - some existing bookmarks will not work anymore once the encoding is changed > > +1 for option 2 from me,
+1 for 2 > -- > Sergiu Dumitriu > http://purl.org/net/sergiu/ > _______________________________________________ > devs mailing list > [email protected] > http://lists.xwiki.org/mailman/listinfo/devs > -- Thomas Mortagne _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

