Re: help with ap_escape_uri()

David Wortham Sat, 05 May 2007 12:22:38 -0700

Thibut,
  Point taken.  I didn't have any trouble with ap_escape_uri, but then
again I'm not testing on Debian.


  I fear the problem is that the "ap_escape_uri(...)" function has been
turned into a macro for the "ap_os_escape_path(...)" function (as you said
before).  It seems as though the pertinent character mapping table involved
with the ap_os_escape_path function is OS-dependent (I saw some tables which
considered both '&' and '?' characters which should be escaped and some
which considered only '?'.  I'm sure some of the other characters that
should be escaped will get skipped on certain OSes too.

  Since the list of escapable URI characters is governed by an RFC and is
OS-independent, the function should probably not have been merged into the
OS-dependent function (I saw mailing list archives around 2002 where coders
were actively changing code from using "ap_os_escape_path" calls to
"ap_escape_uri", so I am assuming they were once independent functions).

  My assumption is that this differentiation will only be seen in certain
OSes, but that the true "bug" is that "ap_escape_uri" is functionally the
same as "ap_os_escape_path" when they should be different.  In either case,
I think your solution is to use a 3rd party function (or write your own) to
URI-encode unless you can guarantee that your module is compiled against an
RFC-compliant URI_encode function.

Regards,
Dave

P.S. Please DON'T CC me in replies unless it is a BCC.  I am trying to make
it harder for email harvesters to get my addresses, not easier.  Thanks.




On 5/5/07, Thibaut VARENE <[EMAIL PROTECTED]> wrote:

On 5/5/07, David Wortham <[EMAIL PROTECTED]> wrote:
> Thiabut,
>    As far as I know, URI escaping functions escape all non-alpha
numberics
> which are not in the following set of characters: {'-', '_', ':', '/',
'?',
> '=', '&', '#', '.'} (there may be others I can't think of right
now).  If a
> character is in that set of characters, the URI remains "legal" even if
the
> character is unescaped.  This set of characters is

That doesn't seem correct: ap_escape_uri() certainly escapes ';', '#'
and '?' for instance (i just verified this).

>    A reason for this:
> If you start with a link
> (http://www.nowhere.com/some_dir?where_you_going=nowhere#top),
> there are a number of special characters that are requred to parse the
URI
> correctly.
> Without these characters: {'/', ':'}, there can be no "http://";.
>  Without this character; {'?'}, there is no query string... only a
run-on
> directory-path.
>  Without this character; {'#'}, there is no anchor... only an
incorrectly
> long GET parameter value.

I agree but I don't think that's the scope of ap_escape_uri() (which
is ap_os_escape_path() behind the scenes). I understand that this
function should precisely escape all 'reserved' characters found in
file paths so that they do not interfere with the normal parsing of
queries. The issue here is exactly that: if a filename contains a '&',
it will be interpreted as an argument list and break anything that do
URL parsing (such as what is reported in the debian bug report I
pointed at). I don't get why ap_escape_uri() correctly escapes '?' to
avoid this, but not '&'.

>    This is not a bug; you need to manually escape any of the special
> characters (probably called URI META characters or something like that)
if
> you expect them to be URL-encoded.  If all '&' characters were
URI-escaped
> all of the time, there would be no way to create a GET parameter list;
there
> would never be more than one parameter.

See above, I still believe this is a bug, or there's some kind of
incoherency I don't understand... RFC1738 seems to claim that:

"Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
   reserved characters used for their reserved purposes may be used
   unencoded within a URL."

/reserved characters for their reserved purpose may be used
unencoded/, but it says that outside of their scope they must also be
encoded. My understanding is that ap_os_escape_path() should only be
used on the path part of an URL and as such it should encode the
reserved characters that are not to be found in a the path part of
said URL... That includes '&'.

>    As for a workaround, you will need to find a pool-friendly (assuming
you
> are using pools for memory allocation in this specific instance)
> character/substring replacement function.  You will likely want to do a
> straight encode of all components of a URI seperately with this function
> then use the ap_escape_uri().  I am not familiar with a particular
function
> that will do the trick, but I use a pool-modified version of a Yahoo!
> C-library function for URL-encoding.

It seems extremely overkill and costly to me to have to do a second
pass of search-n-replace just to escape '&' that ap_escape_uri() has
left aside...

Thanks for your feedback, but I'd like to see more arguments claiming
that this is a feature and not a bug ;)

Thibaut

PS: please CC-me in replies.

--
Thibaut VARENE
http://www.parisc-linux.org/~varenet/




--
David Wortham
Senior Web Applications Developer
Unspam Technologies, Inc.
(408) 338-8863

Re: help with ap_escape_uri()

Reply via email to