-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
On Sunday, March 31 at 11:16 PM, quoth Luis Mochan:
>> I'm a perl guy, yet that's non-trivial here. Thx. :-)
>>
> You're welcome. I don't know if there are other characters that appear
> in an url and need to be escaped for the shell ([;><]?); they could
> easily be accomodated by modifying 'wlmsanitize'. The page for the
> extract_url project (http://www.memoryhole.net/~kyle/extract_url/)
> mentions that the program already transforms characters dangerous to
> the shell, but then it only mentions explicitly single quotes and
> dollar signs.
Hello,
I'm the author of extract_url.pl, so perhaps I can shed some light
here.
The *correct* place to "fix" the issue of escaping (or otherwise
sanitizing) ampersands is in the sanitizeuri function (line 208). The
current version of extract_url.pl uses this:
sub sanitizeuri {
my($uri) = @_;
$uri =~
s/([^a-zA-Z0-9_.!*()\@&:=\?\/%~+-])/sprintf("%%%X",ord($1))/egs;
return $uri;
}
Essentially, what that does is explicitly whitelists the characters
a-z, A-Z, 0-9, _, ., !, *, (, ), @, &, :, =, ?, /, %, ~, +, and - and
turns *anything* else into the percent-encoded equivalent (e.g. %26),
which should be correctly decoded by any standards-compliant
URL-decoder (see RFC 3986). If you want to eliminate ampersands from
the characters allowed in a URL, simply remove the ampersand from that
list. It's as simple as that. I think Luis's patch is a little overly
complicated, and I think the policy of using backslashes to escape
such characters (instead of percent-encoding) is dangerous, given that
it's more likely to be stripped off by intervening scripts. I don't
want future bug reports that say "my setup strips backslashes, so can
you create an option that will triple-backslash the $ character?". :)
(Followed, the next week, by a request for quadruple-backslashing, of
course!)
I've personally never had a problem with ampersands, and I'm not sure
why some people do. Extract_url.pl constructs system commands like so:
/path/to/handler 'http://url.with/an&ersand'
... which should be perfectly safe and work just fine (and does for
me). I suspect the problem stems from using other wrapper script (e.g.
/etc/urlhandler/urlhandler.sh). I bet the that wrapper script is not
properly quoting its first argument.
In any event, percent-encoding, by modifying that one line, is
probably the right way to go.
~Kyle
- --
The purpose of computing is insight, not numbers.
-- Richard W. Hamming
-----BEGIN PGP SIGNATURE-----
Comment: Thank you for using encryption!
iQIcBAEBCAAGBQJRWfZoAAoJECuveozR/AWeH4oQAKRu3Jg1n7KVXT0q0DogCoE+
Ms/gH8EKUwN8KtWhg3wNDgCIh0GXaNykywQPshbM59qP6U8uFofavngGfQv1YCEV
vM94vsNLY8AOfdv/6tRkQFKDi5RadKRfjcJYqHzr11LSJ2e+Ns+i4gx+0jkSCe9/
2FIWjZjsmH5WUHNktAzC0dCGxqBb6vO4Oc7JRuLpaof6jLWLMvJBgM9HVCf67RrX
aEALusVBqSZKBlr+UBk1lF0obEbijGX+hJuHg8udaOVgCsljpzDcOku5my2V13Pu
LZ1ltKv4/y+Z2tofyjDpXNnsomENYfWb6LGfQgystY8xvSv94TJLOlM7oaSsJmJq
hPdP0T5rJ3lryaadc3I5p7GUI5zqUk0T6e8FM8vM1ZUXS8NyN0ZN7NeSSX/5mAMS
OCCkxxXSaLnbr2HUetjYknnVB4W6WKR2eEjgP+VHMtemRb9W6UVgjO1nnoqm4WOM
zRPDIk6VvJgTPUuIso5oq2JoYC0wowmXJBz31UL6y98p1zcPcZVPFDxtf/9p6pUV
/VTDD4bPZCSaQiwhr2abUd4OxOd5bpYx994Z7L5oCQezGDXhEt6XgeEdGBdT21bt
z8FKnqGNOp0EO9C2kX9fPGbRITXK32urUEqeuuB0AHDp3D7VyZ3KRiXIeFFRWvMj
kQzyzKnbnm1uHloyk89l
=n1YG
-----END PGP SIGNATURE-----