On Mon, Jan 14, 2013 at 09:18:52AM +0100, Peter Bex wrote: > On Mon, Jan 14, 2013 at 02:42:40PM +0900, Alex Shinn wrote: > > On Mon, Jan 14, 2013 at 1:36 PM, Sungjin Chun <[email protected]> wrote: > > > As far as I know, revised RFC permits UTF-8 characters in the URL without > > > encoding. Am I wrong here? > > > > Thus you can't use raw non-ASCII bytes in a URI - they must > > be encoded, and interpretation is up to the origin (and is overwhelmingly > > utf8 these days). > > Wow, thanks for doing the research! I was a bit lazy in not doing > that in the first place. It's not the first time though that people > think something's wrong in uri-generic whereas on closer reading of > the RFC it turns out to be correct :) > > There is a very common misconception held by many programmers that you > only need to encode an URI whenever the link doesn't work in a browser. > However, this is a source of vulnerabilities and subtle bugs. > A lot of browsers simply try to cope with broken HTML and even broken > URI strings, apparently. > > > It would of course be possible for any tool or webserver to > > accept URIs with non-ASCII bytes, but I don't know of any > > browsers which would _send_ such a request, because in > > general it would be rejected. > > We've decided to make uri-generic follow the RFC as closely as > possible. To our knowledge, this library is the most RFC-compliant > URI library available for *any* language. >
I worked on an FTP program years ago that operated in an ecosystem where lots of technically incorrect URLs were pasted around, and we got a bug report that they weren't working in our client. To 'fix' it, we had to remove support for correct URLs to handle this more common use case. I regret having to do that to this day, so thank you very much for RFC-compliant parsing. -Alan -- my personal website: http://c0redump.org/ _______________________________________________ Chicken-users mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/chicken-users
