This looks like a likely fix - checks against the beginning of the path
rather than anywhere in the URL.
Server.h
========
remove the body from the IsDisallowed() method.
Server.cc
=========
New method (moved from header):
int Server::IsDisallowed(String url)
{
URL u(url);
return _disallow.match(u.path(), 0, 0);
}
and in Server::robotstxt - add "^" to the start of the
patterns to match the beginning of lines.
else if (pay_attention && mystrcasecmp(name, "disallow") == 0)
{
if (debug > 1)
cout << "Found 'disallow' line: " << rest << endl;
//
// Add this path to our list to ignore
//
if (*rest)
{
if (pattern.length())
pattern << '|' << "^" << rest;
else
pattern << "^" << rest;
}
}
Jamie Anstice
Search Scientist
S.L.I. Systems
[EMAIL PROTECTED]
ph: 64 961 3262
mobile: 64 21 264 9347
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html