In article <[EMAIL PROTECTED]>, Lorne Easton wrote: > I need to write some code that extracts that extracts hyperlinks from a > scalar ($data) and puts them into an array. > > I imagine that grep can do this, but my mastery of it and > reqular expressions are not brilliant. > > Can you please provide some example code, or at least point me in the right > direction?
If you only need the URLs of the hyperlinks, then HTML::LinkExtor is just what you need, and it is provided with HTML::Parser. HTML::SimpleLinkExtor might be worth a try too. http://search.cpan.org/search?dist=HTML-SimpleLinkExtor http://search.cpan.org/search?dist=HTML-Parser Otherwise, if you want the URLs and the text inside, something like the following might work: #!/usr/bin/perl -w use strict; use HTML::Parser 3; my $data = <<'_HTML_'; <p><a href="http://foo">bar</a><br> foo text baz <a href="http://baz">quux</a></p> _HTML_ my @links = parse_links($data); # We now print the links we found my $count; foreach (@links){ print ++$count . ". Description: $_->[1]\n URL: $_->[0]\n\n" } sub parse_links { my $data = shift; my ( @links, $inside ); my $count = 0; # Preparing the parser my $linkparser = HTML::Parser->new( report_tags => ['a'], # Only dealing with <A> tags unbroken_text => 1, # Avoid text split over several lines # Called each time a <A ...> is found start_h => [ sub { # Storing the HREF attribute $links[$count] = shift->{href}; # We should recall we're inside a <A> element $inside = 1; }, 'attr' ], # Called when </A> is found end_h => [ sub { $count++; $inside = 0; }, '' ], # Called when text is found text_h => [ sub { # We're only interested in text inside <a>...</a> return unless $inside; # Store the text with the previous stored HREF # attribute $links[$count] = [ $links[$count], shift ]; }, 'dtext' ], ); # Launch the parser $linkparser->parse($data)->eof(); return wantarray ? @links : \@links; } __END__ -- briac A flying swallow. A fox stalks under a she-oak. A nesting dove. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]