Re: uri_escape unsafe characters

Andrew Pimlott Tue, 17 Oct 2000 12:55:47 -0700
On Tue, Oct 17, 2000 at 11:08:59AM -0700, Randal L. Schwartz wrote:
> >>>>> "Andrew" == Andrew Pimlott <[EMAIL PROTECTED]> writes:
> Andrew> Constructing
> Andrew> query strings isn't rocket science,
> 
> Apparently it is, because people are getting it wrong.

They are getting it right except for one little piece, about which I
think the confusion is justified--and avoidable (see below).

> Andrew>  and most CGI programmers have in
> Andrew> their head, "escape the names and values, then separate them with =s
> Andrew> and &s".
> 
> Just because most people are confused

Nothing in the above is confused.  Obviously, it is imprecise, but
presuming a modicum of sense in adding the =s and &s, it is (to my
knowledge) right.  Don't you think it's nice to have rules that are
simple enough to remember? :-)

> Andrew>   Which is perfectly fine, except that the obvious candidate
> Andrew> for "escape" doesn't (by default) support this use.  That is
> Andrew> surprising.
> 
> Not at all.  It does the escaping for a URL, not the additional
> requirements for a parameter within the query form because the normal
> query processing uses & as an *additional* delimiter.

"the escaping for a URL" is a meaningless phrase, which I think is
the heart of the confusion.  You mean, of course, the escaping for
the path part of the URL.  However, there is no justification in RFC
2396 for considering path component escaping as "normal", or query
string component escaping as "additional".  Instead, it says,

   Normally, the only time escape encodings can safely be made is
   when the URI is being created from its component parts; each
   component may have its own set of characters that are reserved,
   so only the mechanism responsible for generating or interpreting
   that component can determine whether or not escaping a character
   will change its semantics. 

Given this, it seems that a nice (ie, the Right) interface to
URI::Escape would require passing a string and a component.  Adding
that interface would go far towards improving programmer
understanding and reducing mistakes.

By the way, the RFC also says, "Within a path segment, the
characters "/", ";", "=", and "?" are reserved.".  Thus, if
URI::Escape really is for escaping paths, "=" should be added to the
list of escaped characters.  Also, "The tilde '~' character was
added to those in the 'unreserved' set", so it should be removed
from the list.

Let me add that I'm not an expect on this RFC, and have just been
reading it in an attempt to understand this particular issue, so
it's possible I have made mistakes.

Andrew
Re: uri_escape unsafe characters

Reply via email to