On Mon, Jan 14, 2013 at 02:42:40PM +0900, Alex Shinn wrote: > On Mon, Jan 14, 2013 at 1:36 PM, Sungjin Chun <[email protected]> wrote: > > As far as I know, revised RFC permits UTF-8 characters in the URL without > > encoding. Am I wrong here? > > Thus you can't use raw non-ASCII bytes in a URI - they must > be encoded, and interpretation is up to the origin (and is overwhelmingly > utf8 these days).
Wow, thanks for doing the research! I was a bit lazy in not doing that in the first place. It's not the first time though that people think something's wrong in uri-generic whereas on closer reading of the RFC it turns out to be correct :) There is a very common misconception held by many programmers that you only need to encode an URI whenever the link doesn't work in a browser. However, this is a source of vulnerabilities and subtle bugs. A lot of browsers simply try to cope with broken HTML and even broken URI strings, apparently. > It would of course be possible for any tool or webserver to > accept URIs with non-ASCII bytes, but I don't know of any > browsers which would _send_ such a request, because in > general it would be rejected. We've decided to make uri-generic follow the RFC as closely as possible. To our knowledge, this library is the most RFC-compliant URI library available for *any* language. Cheers, Peter -- http://sjamaan.ath.cx _______________________________________________ Chicken-users mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/chicken-users
