Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

Peter Bex Mon, 14 Jan 2013 00:19:21 -0800

On Mon, Jan 14, 2013 at 02:42:40PM +0900, Alex Shinn wrote:
> On Mon, Jan 14, 2013 at 1:36 PM, Sungjin Chun <[email protected]> wrote:
> > As far as I know, revised RFC permits UTF-8 characters in the URL without
> > encoding. Am I wrong here?
> 
> Thus you can't use raw non-ASCII bytes in a URI - they must
> be encoded, and interpretation is up to the origin (and is overwhelmingly
> utf8 these days).


Wow, thanks for doing the research!  I was a bit lazy in not doing
that in the first place.  It's not the first time though that people
think something's wrong in uri-generic whereas on closer reading of
the RFC it turns out to be correct :)

There is a very common misconception held by many programmers that you
only need to encode an URI whenever the link doesn't work in a browser.
However, this is a source of vulnerabilities and subtle bugs. 
A lot of browsers simply try to cope with broken HTML and even broken
URI strings, apparently.

> It would of course be possible for any tool or webserver to
> accept URIs with non-ASCII bytes, but I don't know of any
> browsers which would _send_ such a request, because in
> general it would be rejected.

We've decided to make uri-generic follow the RFC as closely as
possible.  To our knowledge, this library is the most RFC-compliant
URI library available for *any* language.

Cheers,
Peter
-- 
http://sjamaan.ath.cx

_______________________________________________
Chicken-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

Reply via email to