Hi, I have been working with libww package for some time and I must say
that i couldn't live without it. :-)
I have been working on using the RobotsRules package and it was quite
simple to add to my code. But I have 2 questions.
1) Have I understood it correctly that I only need to check and parse a
site once(in a program run) so if i visit other sites, i don't have to
recheck it again if I come back to it later on?
2) I would like to avoid reading the robots.txt more than once(just for
being nice), and I am looking for a way to check that, what I came up with
was this(please excuse my poor perl skills :)):
-[example]---
sub get_url {
$SiteMainUrl="some.other.site";
$SiteUrlToGet="some.other.site/somepage.html";
if ($RobotSites->{"$SiteMainUrl"} ne 1) {
$Roboturl = $SiteMainUrl."robots.txt";
my $robots_txt = get $Roboturl;
$robotsrules->parse($Roboturl, $robots_txt);
}
$RobotSites -> {$SiteMainUrl} = 1;
if ($robotsrules->allowed($SiteUrlToGet)) {
#
# Do the url
#
}
}
-----------
Is there a better or more correct way to do it? If the RobotsRules
remembers the sites, could i use it to verify that it has the information
so that I would not need to get robots.txt again?
Best Regards
/Martin