On Mon, Jan 28, 2013 at 8:21 AM, Brandon McCaig <bamcc...@gmail.com> wrote: > On Sat, Jan 26, 2013 at 06:16:14PM -0800, Jim Gibson wrote: >> Better add periods to that regular expression character class: >> >> if( $link =~ /mailto:([\w@.]+)/ ) { >> >> … or include everything up to but not including the second double-quote: >> >> if( $link =~ /"mailto:([^"]+)/ ) { > > I've never used HTML::TreeBuilder::XPath, but I highly doubt that > the attr method would return the quotes (and if it did, they > could be single-quotes instead). It would probably be best to > find a module that knows how to properly parse mailto URIs, but > failing that I think that matching everything *from the > beginning*[1] up to a literal '?' should suffice. > > [1] You may wish to tolerate leading white space too, but I'm not > sure if that is valid. > > if($link =~ /^mailto:([^\?]+)/) { > my $email = $1; > ... > > Untested, but can't *possibly* fail. ;)
A module is a good idea since URI will parse a valid mailto and ignore leading whitespace. Note however there may be multiple comma separated emails. See: perldoc URI. my $uri = URI->new($link); if ( $uri->scheme eq 'mailto') { my $email = $uri->path; ... } -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/