On 3/31/06 Joseph Hourcle wrote:
>On Mar 29, 2006, at 1:44 PM, John Gold wrote:
>>>> Is there a grep pattern to format email addresses and web site
>>>> addresses?
[snip]
>Just because it starts in 'www' or ends in '.com' doesn't
>mean it's a reference to an HTTP server.
>
>What you're asking for as actually two things:
> 1. find things that are getting turned into hyperlinks
> 2. turn the things into hyperlinks
>
>The second one is easy, when you know what the proper protocol
>and such:
[snip]
>-- but the first one is a royal pain.
Joe's correct.
Here's a subroutine that I've used, following an example in the Perl
Cookbook by Christiansen & Torkington. Watch out for email line
wrapping.
You could make a BBEdit Perl filter from it.
The only way it will translate to a URL without a protocol (e.g. 'http')
is if the entire text handed to it is the web address, and you add a '1'
or something non-Perl-false, as the second argument to the subroutine
(guess_ok).
And this DOES NOT work with all possible resource addresses.
## URL converter ###########################
# by Bruce Van Allen, [EMAIL PROTECTED]
# after Christiansen & Torkington
#
# rev 1.2 7/13/01
#
# Converts most URLs within a text to HTML links.
# Also converts most plain email addresses
#
# Rules:
# - URLs must start with a protocol (http, ftp, etc)
# unless an assumed URL without protocol is the entire text;
# - URLs & email bounded by < and > will be converted;
# - URLs & email bounded by quotes (" or ') will *not* be converted;
# - existing link markup left alone;
# - other markup preserved;
sub urlify {
my $self = shift;
unshift( @_, $self ) unless ref($self);
my ($text, $guess_ok, $urls, $ltrs, $gunk, $punc, $ante, $any);
# Get the original text
return '' unless $text = shift;
return '' if $text =~ /^\s*$/;
$guess_ok = shift || '';
# Make some regex pieces
$urls = '(http|file|ftp|mailto|afp)';
$ltrs = '\w';
$gunk = '/#~:.?+=&[EMAIL PROTECTED]'; # $gunk & $punc overlap
$punc = '.:?\-';
$ante = '=\"\'';
$any = "${ltrs}${gunk}${punc}";
## Special case to add http/mailto protocol to assumed url
if ($guess_ok and $text =~ /^\s*([$any]+)\s*$/) {
my $addr = $1;
if ($addr !~ /^$urls:/) {
$addr =~ /\@/
and $text = qq{<a href="mailto:$addr">$addr</a>}
or $text = qq{<a href="http://$addr">$addr</a>};
}
}
# First convert plain email addresses
$text =~
s{((\s*<?)(mailto:)?\b([EMAIL PROTECTED])\s*(?!</a>)(
?=[$punc]*[^${any}]|$))}
{$3 ? $1 : qq{$2<a href="mailto:$4">$4</a>} }egoi;
# Convert URLs
$text =~
s{(([$ante]?)(\s*<?)\b(${urls}:[$any]+?)\s*(?!</a>)(?=[$punc]*[^$any]|$)
)}
{$2 ? $1 : qq{$3<a href="$4">$4</a>}}egoi;
$text =~ s/ $//g; # ****
return $text
}
- Bruce
__bruce__van_allen__santa_cruz__ca__
--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to: <[EMAIL PROTECTED]>