Re: [development] URL encoding

Scott Reynen Sat, 13 Mar 2010 21:45:51 -0800

On Mar 13, 2010, at 8:20 PM, nitin gupta wrote:

I completely agree to what you and Scott are trying to say. But, Iam not looking to create an URL, just to sanitize it to removedisallowed character, i.e. what a browser would do while accessing aURL when a user inputs an URL. Consider, I parse the following URLfrom XML:
http://example.com?test/com
Do you think I should encode the '/' in the query part i.e. [test/com]??

Technically, yes, but that's beside the point. Regardless of howstrictly you choose to apply URL encoding, you should be applying itto specific URL parts, not full URLs.

I don't think we need to. (Nor will Firefox, if you enter this URLin the address bar).

You're right that encoding the slash character isn't particularlyimportant in the query. In a path segment, however, the differencebetween encoded and unencoded slashes is very significant; http://example.com/a/b/cis different than http://example.com/a%2fb/c. And a slashdefinitely shouldn't be encoded where it's used as a delimiter betweenURL components. This is actually a good example of why encoding mustbe applied to individual URL components, not the full URL.

If a URL contains characters which are allowed in the URLdictionary, will we ever need to encode those characters? No.


What is the URL dictionary?  Here's one of the relevant RFC on URLs:

http://www.ietf.org/rfc/rfc3986.txt

Selected quotes:

"A percent-encoding mechanism is used to represent a data octet_in_a_component_""the conflicting data must be percent-encoded_before_the_URI_is_formed_"

Emphasis added to, well, emphasize that encoding applies to componentparts.


--
Scott Reynen
MakeDataMakeSense.com

Re: [development] URL encoding

Reply via email to