[Cc'ed to [EMAIL PROTECTED] and to TimB as I try to blame him :-)]
Ilmari Karonen <[EMAIL PROTECTED]> writes:
> In comp.lang.perl.misc, there has been -- yet again -- heated discussion
> about the default behavior of URI::Escape. Briefly, it all started when
> someone recommended the use of said module for composing a URI, without
> mentioning the need to use a non-default set of escaped characters.
Personally I never use the uri_escape() function for anything. I
always use the URI objects as they always get the escaping correct
without me having to think again. If I want custom escaping I always
use something like:
s/([...])/$URI::Escape::escapes{$1}/g
I agree that the current uri_escape() default is a bit useless.
> Quoted below is part of one of my own posts in the thread (in reply to
> brian d foy). Looking at it now, the tone could've been a little (okay,
> a lot) more polite, but I still stand by the actual point.
>
> What Bart really wants to say is that it doesn't escape the reserved
> characters [;/?:@&=+$,]. This is broken, since according to RFC 2396
> these characters must be escaped _except when used for their reserved
> purpose_.
>
> Why is that broken, then? Because if the input contains any reserved
> characters that are not meant to be escaped, then the input must already
> be past the stage where escaping should be done.
>
> There are exactly two meaningful classes of characters to escape: One is
> [^A-Za-z0-9\-_.!~*'()], and the other is *no characters at all*. The
> former should be used on fragments of an URI before joining them, while
> the latter should be (not) used if you ever get the temptation to escape
> an already-composed URI.
>
> There's absolutely nothing HTTP- or CGI-specific about this. This is
> just basic RFC 2396 compliance.
>
> I still can't believe Gisle Aas got this wrong.
I was still young when this default was established :-)
Actually I think it was established by Tim Bunce at the time he hacked
on URI::URL. See for instance
<http://www.ics.uci.edu/pub/websoft/libwww-perl/archive/1995h1/0105.html>.
I'm not entirely sure that the uri_escape() function was introduced by
him. It might also have been a Martijn Koster thing. All I know is
that this all happened a long time ago and that it was certainly not
me that came up with that default :-)
The uri_escape() in that version URI::URL was then later moved to
URI::Escape, but the original default $unsafe arg was kept. I guess I
can be blamed for not changing the default at this point.
> I wonder if changing
> the default character class would break more existing code than it would
> fix.
>
> This last paragraph is what I wanted to ask you directly. Obviously we
> can talk about this on Usenet 'til the cows come home, but I'd really
> like to know if there's a reason for the current behavior, and whether
> it would be practical to change it at this point.
As it seems unlikely that any _working_ code could be using this
function it might be ok to simply change the default. But this has
been this way for such a long time (more than 6 years) so I still
hesitate a bit.
I see 2 ways of changing the default:
1) Remove % from the current set. (URI.pm already considers,
% to be part of URIC, although this is a bit internal).
2) Go with your suggestion: [^A-Za-z0-9\-_.!~*'()]
It looks like 1) is more likely to not break code, but perhaps 2) is a
more useful default.
Regards,
Gisle