Thibut, Point taken. I didn't have any trouble with ap_escape_uri, but then again I'm not testing on Debian.
I fear the problem is that the "ap_escape_uri(...)" function has been turned into a macro for the "ap_os_escape_path(...)" function (as you said before). It seems as though the pertinent character mapping table involved with the ap_os_escape_path function is OS-dependent (I saw some tables which considered both '&' and '?' characters which should be escaped and some which considered only '?'. I'm sure some of the other characters that should be escaped will get skipped on certain OSes too. Since the list of escapable URI characters is governed by an RFC and is OS-independent, the function should probably not have been merged into the OS-dependent function (I saw mailing list archives around 2002 where coders were actively changing code from using "ap_os_escape_path" calls to "ap_escape_uri", so I am assuming they were once independent functions). My assumption is that this differentiation will only be seen in certain OSes, but that the true "bug" is that "ap_escape_uri" is functionally the same as "ap_os_escape_path" when they should be different. In either case, I think your solution is to use a 3rd party function (or write your own) to URI-encode unless you can guarantee that your module is compiled against an RFC-compliant URI_encode function. Regards, Dave P.S. Please DON'T CC me in replies unless it is a BCC. I am trying to make it harder for email harvesters to get my addresses, not easier. Thanks. On 5/5/07, Thibaut VARENE <[EMAIL PROTECTED]> wrote:
On 5/5/07, David Wortham <[EMAIL PROTECTED]> wrote: > Thiabut, > As far as I know, URI escaping functions escape all non-alpha numberics > which are not in the following set of characters: {'-', '_', ':', '/', '?', > '=', '&', '#', '.'} (there may be others I can't think of right now). If a > character is in that set of characters, the URI remains "legal" even if the > character is unescaped. This set of characters is That doesn't seem correct: ap_escape_uri() certainly escapes ';', '#' and '?' for instance (i just verified this). > A reason for this: > If you start with a link > (http://www.nowhere.com/some_dir?where_you_going=nowhere#top), > there are a number of special characters that are requred to parse the URI > correctly. > Without these characters: {'/', ':'}, there can be no "http://". > Without this character; {'?'}, there is no query string... only a run-on > directory-path. > Without this character; {'#'}, there is no anchor... only an incorrectly > long GET parameter value. I agree but I don't think that's the scope of ap_escape_uri() (which is ap_os_escape_path() behind the scenes). I understand that this function should precisely escape all 'reserved' characters found in file paths so that they do not interfere with the normal parsing of queries. The issue here is exactly that: if a filename contains a '&', it will be interpreted as an argument list and break anything that do URL parsing (such as what is reported in the debian bug report I pointed at). I don't get why ap_escape_uri() correctly escapes '?' to avoid this, but not '&'. > This is not a bug; you need to manually escape any of the special > characters (probably called URI META characters or something like that) if > you expect them to be URL-encoded. If all '&' characters were URI-escaped > all of the time, there would be no way to create a GET parameter list; there > would never be more than one parameter. See above, I still believe this is a bug, or there's some kind of incoherency I don't understand... RFC1738 seems to claim that: "Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL." /reserved characters for their reserved purpose may be used unencoded/, but it says that outside of their scope they must also be encoded. My understanding is that ap_os_escape_path() should only be used on the path part of an URL and as such it should encode the reserved characters that are not to be found in a the path part of said URL... That includes '&'. > As for a workaround, you will need to find a pool-friendly (assuming you > are using pools for memory allocation in this specific instance) > character/substring replacement function. You will likely want to do a > straight encode of all components of a URI seperately with this function > then use the ap_escape_uri(). I am not familiar with a particular function > that will do the trick, but I use a pool-modified version of a Yahoo! > C-library function for URL-encoding. It seems extremely overkill and costly to me to have to do a second pass of search-n-replace just to escape '&' that ap_escape_uri() has left aside... Thanks for your feedback, but I'd like to see more arguments claiming that this is a feature and not a bug ;) Thibaut PS: please CC-me in replies. -- Thibaut VARENE http://www.parisc-linux.org/~varenet/
-- David Wortham Senior Web Applications Developer Unspam Technologies, Inc. (408) 338-8863