DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12798>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12798 Path should not be encoded in HttpMethodBase ------- Additional Comments From [EMAIL PROTECTED] 2003-03-07 23:13 ------- This patch doesn't look right to me. I'm no expert but have just recently had to review our URL encoding code so it concerns me that we don't seem to be encoding the query string. RFC 1738 (Uniform Resource Locators), specifies that: Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set, if the use of the corresponding character is unsafe, or if the corresponding character is reserved for some other interpretation within the particular URL scheme. The unsafe characters listed in the rfc are: "{", "}", "|", "\", "^", "~", "[", "]", "`", "<", ">", """, "#", "%" in addition the reserved characters are: ";", "/", "?", ":", "@", "=" and "&" It then adds: Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL. On the other hand, characters that are not required to be encoded (including alphanumerics) may be encoded within the scheme-specific part of a URL, as long as they are not being used for a reserved purpose. Now, what this implies to me is that the process for encoding any given URL is: 1. Break the URL into it's various parts, for HTTP this would be: http://<host>:<port>/<path>?<searchpart> 2. Take each part of the URL and encode it (though one would hope that a host name contains only US-ASCII characters or the DNS system is going to have trouble with it anyway). 3. Reassemble the URL. Now, I'm somewhat unsure as to whether the URL we are given is encoded or not and the JavaDocs for the methods do not specify this. So the first action item of this bug must be to decide whether methods should be passed an encoded or an unencoded URL and document it. IF we decide that URLs passed into the methods should be encoded, then we need to stop encoding the path, on the other hand, IF we decide that URLs passed into the methods should be unencoded, then we need to encode the query string as well. Also, if all URLs are being passed in encoded, then we should have no need for URL encoding functionality as we should only ever use encoded URLs. My suggestion would be to only ever work with encoded URLs, but then do one of the following: 1. add a new constructor to each of the methods which takes a boolean to determine whether the URL is encoded or not. If not we encode it before passing it through to anywhere else. 2. provide the URIUtils class (possibly as a separate project) to allow the user to easily encode URLs. We should ensure that there is a method in URIUtils that can take a full URL with non displayable US-ASCII characters and unsafe characters (but no extra reserved characters) and encode it correctly. This prevents the user having to break up the URL to encode it. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
