looks like you need to resolve relative URL to full URL. try the following:
#!/usr/bin/perl -w
use strict;
use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;
my $ua = new LWP::UserAgent();
my $request = new HTTP::Request('GET'=>'http://www.google.com');
my $rep = $ua->request($request);
my @links = ();
my $base = $rep->base();
my $lx = HTML::LinkExtor->new(\&callback);
$lx->parse($rep->content);
foreach my $link (@links)
{
#-- prints the relative URL
print "short: $link\n";
#-- got the abs path for the link
$link = url($link,$base)->abs;
#-- prints the full URL
print "full: $link\n";
}
sub callback(){
my($tag,%links) = @_;
my @urls = values %links;
#-- only image tag
return unless($tag eq 'img');
push(@links,@urls);
}
modify the above to suite your need :-) hope that help
david
Werner Schalk wrote:
> Hello,
>
> maybe this is off-topic but
> I'm sure people know how to help
> me:
>
> I do print a website to stdout
> using libwww and it works great
> but the problem is that self-referencing
> urls must be replaced the full url
> of the site I just requested.
> Like I get www.google.com but the
> images and links of this site to
> link to myself but not to www.google.com.
> The problem with this is that in a
> third-party app which uses the
> output of libwww links and references
> are made to localhost but not to
> www.google.com...
>
> For instance www.google.com contains this line:
> <img src=images/hp1.gif width=50 height=78 alt="">
>
> It should be altered to:
> <img src=http://www.google.com/images/hp1.gif width=50 height=78 alt="">
>
> Is there a way to do this with
> libwww or even with sed?
>
> Bye and thanks,
> Werner.