Now just '%25' is lost from the original url, though still not acceptable. On 3/12/10, nitin gupta <[email protected]> wrote: > Scott, > > I was just going to post this, but you caught me.. yeah there is a problem > with the above proposed solution and you caught it right. It will not > encode > what should be encoded. > > You are right that such URLs must be corrected at source, but that doesn't > count as an excuse if my module fails to work properly. ( > http://drupal.org/node/731798). As mentioned in first post, I know > rawurlencode are for components of the URLs but I have not choice as I can > not assume which literals will be there in the URL and which will be not. > > Accidentally, while working on my other module (Facebook-style > Links<http://drupal.org/project/facebook_link>) > I found the same bug in Facebook. Try sharing the > http://www.google.com/search?q=%22a%26b%22 on facebook, you will see that > they are doing the same. ;) I haven't reported this yet though. > > I changed the function to this, let me know your views: > > function encode_url($url) { > $reserved = array( > ":" => '!%3A!ui', > "/" => '!%2F!ui', > "?" => '!%3F!ui', > "#" => '!%23!ui', > "[" => '!%5B!ui', > "]" => '!%5D!ui', > "@" => '!%40!ui', > "!" => '!%21!ui', > "$" => '!%24!ui', > "&" => '!%26!ui', > "'" => '!%27!ui', > "(" => '!%28!ui', > ")" => '!%29!ui', > "*" => '!%2A!ui', > "+" => '!%2B!ui', > "," => '!%2C!ui', > ";" => '!%3B!ui', > "=" => '!%3D!ui', > ); > > $url = rawurlencode($url); > $url = preg_replace(array_values($reserved), array_keys($reserved), > $url); > $url = preg_replace('!%25!ui', '%', $url); > return ($url); > } > > > I am still testing, so let me know if some case fails for above function. > > -- > Regards, > Nitin Kumar Gupta > http://publicmind.in/blog/ > > > On Fri, Mar 12, 2010 at 7:29 AM, Scott Reynen > <[email protected]>wrote: > >> On Mar 11, 2010, at 11:10 AM, nitin gupta wrote: >> >> I am using the following to solve the problem, any ideas to improve it >> in >>> terms of efficiency or otherwise are welcome: >>> >>> function encodeurl($url) { >>> $reserved = array( >>> ":" => '!%3A!ui', >>> "/" => '!%2F!ui', >>> "?" => '!%3F!ui', >>> "#" => '!%23!ui', >>> "[" => '!%5B!ui', >>> "]" => '!%5D!ui', >>> "@" => '!%40!ui', >>> "!" => '!%21!ui', >>> "$" => '!%24!ui', >>> "&" => '!%26!ui', >>> "'" => '!%27!ui', >>> "(" => '!%28!ui', >>> ")" => '!%29!ui', >>> "*" => '!%2A!ui', >>> "+" => '!%2B!ui', >>> "," => '!%2C!ui', >>> ";" => '!%3B!ui', >>> "=" => '!%3D!ui', >>> ); >>> >>> $url = rawurlencode(rawurldecode($url)); >>> $url = preg_replace(array_values($reserved), array_keys($reserved), >>> $url); >>> return $url; >>> } >>> >> >> There's an old quote [1] that seems somewhat apt here: >> >> Some people, when confronted with a problem, think "“I know, I'll use >>> regular expressions."” Now they have two problems. >>> >> >> That's not entirely apt, as your regular expression might as well be done >> with str_replace(), but you are adding problems rather than removing >> them. >> You should really scrap this whole thing and take a few steps back rather >> than adding more to it; this will break URLs due to flaws in the >> fundamental >> approach. >> >> rawurlencode and rawurldecode are meant to be used on fragments of URLs, >> not whole URLs. It's impossible to properly encode an entire URL without >> first breaking it up into component parts, because the different parts >> require different encoding. For example, "/" should be encoded in a >> query >> string, but not in a path. Treating it the same everywhere is why you're >> having the problem with delimiters being encoded. The preg_replace() >> only >> hides this problem, while introducing new problems (not encoding things >> that >> should be encoded); it's not a solution. >> >> To illustrate the problem, consider this URL: >> >> http://www.google.com/search?q=%22a%26b%22 >> >> That's a Google search for the phrase "a&b". Your function turns that >> into >> this: >> >> http://www.google.com/search?q=%22a&b%22 >> >> That's a Google search for "a, which returns completely different >> results. >> >> Backing up, you apparently have input that looks like this: >> >> >> http://example.com/path with spaces/ >> >> That's not a valid URL, so it needs to be fixed somewhere. Ideally it >> would be fixed at the source, but if that's not an option, you can fix >> this >> specific problem simply with str_replace(' ', '%20', $url); That won't >> break anything else because spaces aren't URL delimiters. I'm guessing >> your >> input has more complex problems with invalid URLs as your attempted >> solution >> is more broad in scope. It's hard to say what you should do without >> knowing >> more about the input. What does the raw XML look like? >> >> [1] >> http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html >> >> -- >> Scott Reynen >> MakeDataMakeSense.com >> >> >> >
-- Sent from my mobile device -- Regards, Nitin Kumar Gupta http://publicmind.in/blog/
