Hi, I'm trying to parse out the emails addresses from a webpage and I'm using the HTML::TreeBuilder::XPath module. I don't really understand XML and it's been a while since I worked with perl*. So far I mashed up a code by looking through past examples online. The HTML portion for the email is like:
<li class="ii">Email: <a href="mailto:n...@place.edu">n...@place.yyy</a></li> The code I put together is: #!/usr/bin/perl use strict; use warnings; use HTML::TreeBuilder::XPath; my $html = HTML::TreeBuilder::XPath->new; my $root = $html->parse_file( 'file.htm' ); my @email = $root ->findnodes(q{//a} ); for my $email(@email) { print $email->attr('href'); } The problem is that it also outputs the link found in another portion of the HTML ( <a href="http://sites.place.yyy/name">). So I get a list of websites and emails, one after another. How can I just output the email section? I also don't understand how the path for "findnodes(q{//a} )" works. What's the "q" for? How do I understand the structure of nodes? Thanks for any advice, JJ *I'm not a programmer; I have a list to compile for work and thought I might automate it to make my life easier. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/