Quoting MoldNet Root ([EMAIL PROTECTED]):
> sorry if this is an off-topic, but i'd
> like to know which is the best regex to much
> an URL...

Here's my preferred solution. This focusses on matching URLs the ways they
tend to be used in emails, rather than brevity or correctness.

$re=qr{
        ([-a-z]+://|(www|ftp)[-.0-9]*)            # http:// or www.
        [a-z0-9][-a-z0-9.]+[a-z0-9](:[-0-9a-z]+)? # hostname, maybe port
        (/([^\s><"]*[^\s><".,;)'?!])?)?           # /path, or nothing
        |                                         # special case for URL
        (mailto|news|about):[^\s><"]*[^\s><".,;)'?!] # schemes without ://
    }x;

while (<>) {
   print "$&\n" while /$re/g;
}

The basic idea is to be permissive about what we'll allow within a URL, but
restrictive about what the final character can be. So it can correctly pull
out URLs like http://www.google.com/search?q=Lara+Croft, or references to
www.amazon.com. If you're gonna put the output straight into an <a href="">,
you'll need some code to add in an http:// or ftp:// as necessary.

Adam

-- 
Adam Rice -- [EMAIL PROTECTED] -- Blackburn, Lancashire, England

Reply via email to