Quoting MoldNet Root ([EMAIL PROTECTED]):
> sorry if this is an off-topic, but i'd
> like to know which is the best regex to much
> an URL...
Here's my preferred solution. This focusses on matching URLs the ways they
tend to be used in emails, rather than brevity or correctness.
$re=qr{
([-a-z]+://|(www|ftp)[-.0-9]*) # http:// or www.
[a-z0-9][-a-z0-9.]+[a-z0-9](:[-0-9a-z]+)? # hostname, maybe port
(/([^\s><"]*[^\s><".,;)'?!])?)? # /path, or nothing
| # special case for URL
(mailto|news|about):[^\s><"]*[^\s><".,;)'?!] # schemes without ://
}x;
while (<>) {
print "$&\n" while /$re/g;
}
The basic idea is to be permissive about what we'll allow within a URL, but
restrictive about what the final character can be. So it can correctly pull
out URLs like http://www.google.com/search?q=Lara+Croft, or references to
www.amazon.com. If you're gonna put the output straight into an <a href="">,
you'll need some code to add in an http:// or ftp:// as necessary.
Adam
--
Adam Rice -- [EMAIL PROTECTED] -- Blackburn, Lancashire, England