looks like you need to resolve relative URL to full URL. try the following:

#!/usr/bin/perl -w

use strict;
use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;

my $ua = new LWP::UserAgent();
my $request = new HTTP::Request('GET'=>'http://www.google.com');
my $rep = $ua->request($request);
my @links = ();
my $base = $rep->base();
my $lx = HTML::LinkExtor->new(\&callback);

$lx->parse($rep->content);

foreach my $link (@links)
{
        #-- prints the relative URL
        print "short: $link\n";

        #-- got the abs path for the link
        $link = url($link,$base)->abs;

        #-- prints the full URL 
        print "full: $link\n";
}

sub callback(){

        my($tag,%links) = @_;
        my @urls = values %links;

        #-- only image tag
        return unless($tag eq 'img');

        push(@links,@urls);

}

modify the above to suite your need :-) hope that help

david

Werner Schalk wrote:

> Hello,
> 
> maybe this is off-topic but
> I'm sure people know how to help
> me:
> 
> I do print a website to stdout
> using libwww and it works great
> but the problem is that self-referencing
> urls must be replaced the full url
> of the site I just requested.
> Like I get www.google.com but the
> images and links of this site to
> link to myself but not to www.google.com.
> The problem with this is that in a
> third-party app which uses the
> output of libwww links and references
> are made to localhost but not to
> www.google.com...
> 
> For instance www.google.com contains this line:
> <img src=images/hp1.gif width=50 height=78 alt="">
> 
> It should be altered to:
> <img src=http://www.google.com/images/hp1.gif width=50 height=78 alt="">
> 
> Is there a way to do this with
> libwww or even with sed?
> 
> Bye and thanks,
> Werner.

Reply via email to