RE: extracting links.. continued..

Lorne Easton Wed, 16 Jan 2002 16:07:15 -0800

Hi there,

Thanks for the advice. I looked at using HTML::LinkExtor but decided against
it.


I am using code like the following:


sub get_urls {

my @url_array;
my ($data) = @_;

print $data;

#Put all "<A HREF links into url_array
while ($data =~ m|(<a href.*</a>)|gi) {

 my  $temp_tag = $1;
#Strip out tags
#Insert code here..

push @url_array,$temp_tag;

}
#Temporary to print out all URLS. Testing purposes only.
foreach my $temp (@url_array){
    print $temp,"\n";
}
print "\n\n",$#url_array," URLs found.\n";
#####################################################################

return(@url_array);
}

Which is cool, but it extracts the entire <A HREF="URL">TEXT</A> text. Is
there a way to modify this regexp to strip out this data as well. Obviously
match (m/) is inclusive if the matched data. Is there any way of modifying
this? Or perhaps writing a regexp to do this??

The problem is that data could be

<a href = "
<a href="

e.t.c...

Perhaps something that grabs all the data in between the quotes would be
useful..

Any ideas would be appreciated..

Cheers,
Lorne




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: extracting links.. continued..

Reply via email to