Hi Kyle,
> I'm the author of extract_url.pl, so perhaps I can shed some light
> here.
Thanks.
> The *correct* place to "fix" the issue of escaping (or otherwise
> sanitizing) ampersands is in the sanitizeuri function (line 208). The
> current version of extract_url.pl uses this:
>
> sub sanitizeuri {
> my($uri) = @_;
> $uri =~
> s/([^a-zA-Z0-9_.!*()\@&:=\?\/%~+-])/sprintf("%%%X",ord($1))/egs;
> return $uri;
> }I tried now your fix, and it didn't work for me; my browser doesn't find the resulting pages when the url has ampersands that are converted to %26 (probably because the % itself is further encoded as %25 before been sent to the server by the browser (?)) > ... > I've personally never had a problem with ampersands, and I'm not sure > why some people do. Extract_url.pl constructs system commands like so: > > /path/to/handler 'http://url.with/an&ersand' I changed my handler to '/bin/echo %s >>tmp.txt' and it wrote the correct result, so I guess you're right here. > ... which should be perfectly safe and work just fine (and does for > me). I suspect the problem stems from using other wrapper script (e.g. > /etc/urlhandler/urlhandler.sh). I bet the that wrapper script is not > properly quoting its first argument. I don't know much about shell programming, but I found that /etc/urlhandler/url_handler.sh is a shell script that obtains its url doing '$url=$1'. I replaced the whole handler by the following program: #! /bin/bash url=$1; shift echo $url >>tmp.txt; and found out that the url is cut short at the first ampersand. I don't understand why echo by itself yields the correct result (above) while echo through a bash script yields the truncated result. Thanks and best regards, Luis -- o W. Luis Mochán, | tel:(52)(777)329-1734 /<(*) Instituto de Ciencias Físicas, UNAM | fax:(52)(777)317-5388 `>/ /\ Apdo. Postal 48-3, 62251 | (*)/\/ \ Cuernavaca, Morelos, México | [email protected] /\_/\__/ GPG: DD344B85, 2ADC B65A 5499 C2D3 4A3B 93F3 AE20 0F5E DD34 4B85
signature.asc
Description: Digital signature
