Hi, I have been working with libww package for some time and I must say
that i couldn't live without it. :-)
I have been working on using the RobotsRules package and it was quite
simple to add to my code. But I have 2 questions.

1) Have I understood it correctly that I only need to check and parse a
site once(in a program run) so if i visit other sites, i don't have to
recheck it again if I come back to it later on?

2) I would like to avoid reading the robots.txt more than once(just for
being nice), and I am looking for a way to check that, what I came up with
was this(please excuse my poor perl skills :)):
-[example]---

sub get_url {

$SiteMainUrl="some.other.site";
$SiteUrlToGet="some.other.site/somepage.html";

if ($RobotSites->{"$SiteMainUrl"} ne 1) {
  $Roboturl = $SiteMainUrl."robots.txt";
  my $robots_txt = get $Roboturl;
  $robotsrules->parse($Roboturl, $robots_txt);
}

$RobotSites -> {$SiteMainUrl} = 1; 


 if ($robotsrules->allowed($SiteUrlToGet)) {
   #
   # Do the url 
   #

  }
}
-----------
Is there a better or more correct way to do it? If the RobotsRules
remembers the sites, could i use it to verify that it has the information
so that I would not need to get robots.txt again?

Best Regards
/Martin
 




Reply via email to