On 3/31/06 Joseph Hourcle wrote:
>On Mar 29, 2006, at 1:44 PM, John Gold wrote:
>>>> Is there a grep pattern to format email addresses and web site
>>>> addresses? 

[snip]

>Just because it starts in 'www' or ends in '.com' doesn't
>mean it's a reference to an HTTP server.
>
>What you're asking for as actually two things:
>   1. find things that are getting turned into hyperlinks
>   2. turn the things into hyperlinks
>
>The second one is easy, when you know what the proper protocol
>and such:

[snip]

>-- but the first one is a royal pain.

Joe's correct.

Here's a subroutine that I've used, following an example in the Perl
Cookbook by Christiansen & Torkington. Watch out for email line
wrapping.

You could make a BBEdit Perl filter from it.

The only way it will translate to a URL without a protocol (e.g. 'http')
is if the entire text handed to it is the web address, and you add a '1'
or something non-Perl-false, as the second argument to the subroutine
(guess_ok).

And this DOES NOT work with all possible resource addresses.

## URL converter  ###########################
# by Bruce Van Allen, [EMAIL PROTECTED]
# after Christiansen & Torkington
#
# rev 1.2  7/13/01
#
# Converts most URLs within a text to HTML links.
# Also converts most plain email addresses
#
# Rules: 
#   - URLs must start with a protocol (http, ftp, etc)
#      unless an assumed URL without protocol is the entire text;
#   - URLs & email bounded by < and > will be converted;
#   - URLs & email bounded by quotes (" or ') will *not* be converted;
#   - existing link markup left alone;
#   - other markup preserved;

sub urlify {
    my $self    = shift;
    unshift( @_, $self ) unless ref($self);
    my ($text, $guess_ok, $urls, $ltrs, $gunk, $punc, $ante, $any);
    # Get the original text
    return '' unless $text = shift;
    return '' if $text =~ /^\s*$/;
    $guess_ok   = shift || '';
    # Make some regex pieces
    $urls = '(http|file|ftp|mailto|afp)';
    $ltrs = '\w';
    $gunk = '/#~:.?+=&[EMAIL PROTECTED]';  # $gunk & $punc overlap
    $punc = '.:?\-';
    $ante = '=\"\'';
    $any  = "${ltrs}${gunk}${punc}";
    
    ## Special case to add http/mailto protocol to assumed url
    if ($guess_ok and $text =~ /^\s*([$any]+)\s*$/) {
        my $addr = $1;
        if ($addr !~ /^$urls:/) {
            $addr =~ /\@/
                and $text = qq{<a href="mailto:$addr";>$addr</a>}
                    or $text = qq{<a href="http://$addr";>$addr</a>};
        }
    }

    # First convert plain email addresses
    $text =~
s{((\s*<?)(mailto:)?\b([EMAIL PROTECTED])\s*(?!</a>)(
?=[$punc]*[^${any}]|$))}
                {$3 ? $1 : qq{$2<a href="mailto:$4";>$4</a>} }egoi;

    # Convert URLs
    $text =~
s{(([$ante]?)(\s*<?)\b(${urls}:[$any]+?)\s*(?!</a>)(?=[$punc]*[^$any]|$)
)}
                {$2 ? $1 : qq{$3<a href="$4">$4</a>}}egoi;
    $text =~ s/ $//g;  # ****
    
    return $text
}

- Bruce

__bruce__van_allen__santa_cruz__ca__

--
------------------------------------------------------------------
Have a feature request? Not sure the software's working correctly?
If so, please send mail to <[EMAIL PROTECTED]>, not to the list.
List FAQ: <http://www.barebones.com/support/lists/bbedit_talk.shtml>
List archives: <http://www.listsearch.com/BBEditTalk.lasso>
To unsubscribe, send mail to:  <[EMAIL PROTECTED]>

Reply via email to