Re: uri escaping spaces

Michael Blakeley Wed, 28 Jun 2000 00:21:48 -0700
At 8:00 PM -0700 6/27/2000, Chris Thorman wrote:
>In the URI escaping spec, '+' is the special (optional) escape code 
>for space.  But since %XX works for any character, %20 also works 
>for space.

I can't find anything like that in 
http://www.ietf.org/rfc/rfc2396.txt - do you have any references? 
Here's what I do see:

    2.4.1. Escaped Encoding

    An escaped octet is encoded as a character triplet, consisting of the
    percent character "%" followed by the two hexadecimal digits
    representing the octet code. For example, "%20" is the escaped
    encoding for the US-ASCII space character.

       escaped     = "%" hex hex
       hex         = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                             "a" | "b" | "c" | "d" | "e" | "f"

Nothing about '+'; it's mentioned as a reserved character elsewhere 
in the RFC, but its significance is not linked to any particular 
implementation function. My interpretation is that '+' is simply a 
reserved character, traditionally used to delimit CGI params with 
multiple arguments, such as
        ?search=foo+bar
but serving no specific purpose in the URI standard beyond its 
reserved-ness. Perhaps there's a tradition of using '+' and %20 
interchangeably, but IMO it's better to be fully RFC-compliant with 
output, while still accepting non-standard inputs as liberally as 
possible.

CGI.pm mentions a slightly-relevant case:
      If the script was invoked with a parameter list (e.g.
      "name1=value1&name2=value2&name3=value3"), the param()
      method will return the parameter names as a list.  If the
      script was invoked as an <ISINDEX> script and contains a
      string without ampersands (e.g. "value1+value2+value3") ,
      there will be a single parameter named "keywords" containing
      the "+"-delimited keywords.

And, if you trust CGI.pm's implementation:
        use CGI;
        use strict;
        $^W = 1;
        my $q = new CGI({});
        $q->param("foo", qw/bar baz biz/);
        print $q->query_string() . "\n";
        $q = new CGI ({}); $q->param("foo", "bar baz biz");
        print $q->query_string() . "\n";

outputs:
        foo=bar;foo=baz;foo=biz
        foo=bar%20baz%20biz

>At 4:54 PM -0700 6/27/00, Michael Blakeley wrote:
>>With the default $escapemode, Embperl seems to encode
>>      javascript('foo bar')
>>as
>>      javascript('foo+bar')
>>but I would have expected
>>      javascript('foo%20bar')
>>like Apache::Utils::escape_uri() does it. The '+', to me, means 
>>multiple options.
>>
>  >Am I misguided? Or is Embperl?

I'm convinced that the conversion really ought to be
        ' '->%20
The patch below (diff against 1.3b3) fixes this. Note that it also 
breaks the 'make test', for obvious reasons.

Any RFC-compliant client must translate %20 to space, but they may 
not all translate '+' to space, since that seems to be traditional 
behavior rather than RFC-specified behavior. For example, when 
Netscape 4.73 submits a form, it translates
        <INPUT type=text name="Text" size=20 maxsize=128 VALUE="foo bar">
to
        ?Text=foo+bar
Apache::param() also seems to be liberal about accepting '+' for %20. 
But it doesn't work that way with the "thin client" I'm coding for. I 
don't think I can argue that the client's wrong without a standards 
doc that says so. I'd also rather that Embperl be conservative about 
what it sends out - adhering to the RFC rather than to common, but 
non-RFC behavior.

-- Mike

diff -c epchar.c.orig epchar.c
*** epchar.c.orig       Tue Jun 27 17:09:17 2000
--- epchar.c    Tue Jun 27 17:21:55 2000
***************
*** 324,330 ****
--- 324,335 ----
           { ' ' ,   "%1D"         },    /* &#29;                Unused  */
           { ' ' ,   "%1E"         },    /* &#30;                Unused  */
           { ' ' ,   "%1F"         },    /* &#31;                Unused  */
+ /* see http://www.ietf.org/rfc/rfc2396.txt */
+ #ifdef ENCODE_SPACES_AS_PLUS
           { ' ' ,   "+"           },    /*      &#32;           Space  */
+ #else
+         { ' ' ,   "%20"           },    /*    &#32;           Space  */
+ #endif
           { '!' ,   ""         },    /*         &#33;      Exclamation mark */
           { '"' ,   "%22"   },    /*    Quotation mark  */
           { '#' ,   ""         },    /*         &#35;           Number sign  */

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: uri escaping spaces

Reply via email to