Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

.alyn.post. Mon, 14 Jan 2013 13:40:30 -0800

On Mon, Jan 14, 2013 at 09:18:52AM +0100, Peter Bex wrote:
> On Mon, Jan 14, 2013 at 02:42:40PM +0900, Alex Shinn wrote:
> > On Mon, Jan 14, 2013 at 1:36 PM, Sungjin Chun <[email protected]> wrote:
> > > As far as I know, revised RFC permits UTF-8 characters in the URL without
> > > encoding. Am I wrong here?
> > 
> > Thus you can't use raw non-ASCII bytes in a URI - they must
> > be encoded, and interpretation is up to the origin (and is overwhelmingly
> > utf8 these days).
> 
> Wow, thanks for doing the research!  I was a bit lazy in not doing
> that in the first place.  It's not the first time though that people
> think something's wrong in uri-generic whereas on closer reading of
> the RFC it turns out to be correct :)
> 
> There is a very common misconception held by many programmers that you
> only need to encode an URI whenever the link doesn't work in a browser.
> However, this is a source of vulnerabilities and subtle bugs. 
> A lot of browsers simply try to cope with broken HTML and even broken
> URI strings, apparently.
> 
> > It would of course be possible for any tool or webserver to
> > accept URIs with non-ASCII bytes, but I don't know of any
> > browsers which would _send_ such a request, because in
> > general it would be rejected.
> 
> We've decided to make uri-generic follow the RFC as closely as
> possible.  To our knowledge, this library is the most RFC-compliant
> URI library available for *any* language.
>


I worked on an FTP program years ago that operated in an ecosystem
where lots of technically incorrect URLs were pasted around, and
we got a bug report that they weren't working in our client.

To 'fix' it, we had to remove support for correct URLs to handle
this more common use case.  I regret having to do that to this day, so
thank you very much for RFC-compliant parsing.

-Alan
-- 
my personal website: http://c0redump.org/

_______________________________________________
Chicken-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/chicken-users

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.

Reply via email to