On Mon, Jan 28, 2013 at 8:21 AM, Brandon McCaig <bamcc...@gmail.com> wrote:
> On Sat, Jan 26, 2013 at 06:16:14PM -0800, Jim Gibson wrote:
>> Better add periods to that regular expression character class:
>>
>>   if( $link =~ /mailto:([\w@.]+)/ ) {
>>
>> … or include everything up to but not including the second double-quote:
>>
>>   if( $link =~ /"mailto:([^"]+)/ ) {
>
> I've never used HTML::TreeBuilder::XPath, but I highly doubt that
> the attr method would return the quotes (and if it did, they
> could be single-quotes instead). It would probably be best to
> find a module that knows how to properly parse mailto URIs, but
> failing that I think that matching everything *from the
> beginning*[1] up to a literal '?' should suffice.
>
> [1] You may wish to tolerate leading white space too, but I'm not
> sure if that is valid.
>
> if($link =~ /^mailto:([^\?]+)/) {
>     my $email = $1;
>     ...
>
> Untested, but can't *possibly* fail. ;)

A module is a good idea since URI will parse a valid
mailto  and ignore leading whitespace. Note however
there may be multiple comma separated emails.

See: perldoc URI.

my $uri = URI->new($link);
if ( $uri->scheme eq 'mailto') {
     my $email = $uri->path;
     ...
}

--
Charles DeRykus

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to