On 5/5/07, David Wortham <[EMAIL PROTECTED]> wrote:
Thiabut, As far as I know, URI escaping functions escape all non-alpha numberics which are not in the following set of characters: {'-', '_', ':', '/', '?', '=', '&', '#', '.'} (there may be others I can't think of right now). If a character is in that set of characters, the URI remains "legal" even if the character is unescaped. This set of characters is
That doesn't seem correct: ap_escape_uri() certainly escapes ';', '#' and '?' for instance (i just verified this).
A reason for this: If you start with a link (http://www.nowhere.com/some_dir?where_you_going=nowhere#top), there are a number of special characters that are requred to parse the URI correctly. Without these characters: {'/', ':'}, there can be no "http://". Without this character; {'?'}, there is no query string... only a run-on directory-path. Without this character; {'#'}, there is no anchor... only an incorrectly long GET parameter value.
I agree but I don't think that's the scope of ap_escape_uri() (which is ap_os_escape_path() behind the scenes). I understand that this function should precisely escape all 'reserved' characters found in file paths so that they do not interfere with the normal parsing of queries. The issue here is exactly that: if a filename contains a '&', it will be interpreted as an argument list and break anything that do URL parsing (such as what is reported in the debian bug report I pointed at). I don't get why ap_escape_uri() correctly escapes '?' to avoid this, but not '&'.
This is not a bug; you need to manually escape any of the special characters (probably called URI META characters or something like that) if you expect them to be URL-encoded. If all '&' characters were URI-escaped all of the time, there would be no way to create a GET parameter list; there would never be more than one parameter.
See above, I still believe this is a bug, or there's some kind of incoherency I don't understand... RFC1738 seems to claim that: "Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL." /reserved characters for their reserved purpose may be used unencoded/, but it says that outside of their scope they must also be encoded. My understanding is that ap_os_escape_path() should only be used on the path part of an URL and as such it should encode the reserved characters that are not to be found in a the path part of said URL... That includes '&'.
As for a workaround, you will need to find a pool-friendly (assuming you are using pools for memory allocation in this specific instance) character/substring replacement function. You will likely want to do a straight encode of all components of a URI seperately with this function then use the ap_escape_uri(). I am not familiar with a particular function that will do the trick, but I use a pool-modified version of a Yahoo! C-library function for URL-encoding.
It seems extremely overkill and costly to me to have to do a second pass of search-n-replace just to escape '&' that ap_escape_uri() has left aside... Thanks for your feedback, but I'd like to see more arguments claiming that this is a feature and not a bug ;) Thibaut PS: please CC-me in replies. -- Thibaut VARENE http://www.parisc-linux.org/~varenet/