This looks like a likely fix - checks against the beginning of the path
rather than anywhere in the URL.

Server.h
========
remove the body from the IsDisallowed() method.

Server.cc
=========
New method (moved from header):

int Server::IsDisallowed(String url)
{
    URL u(url);

    return _disallow.match(u.path(), 0, 0);
}

and in Server::robotstxt - add "^" to the start of the 
patterns to match the beginning of lines.

else if (pay_attention && mystrcasecmp(name, "disallow") == 0)
{
    if (debug > 1)
        cout << "Found 'disallow' line: " << rest << endl;

    //
    // Add this path to our list to ignore
    //
    if (*rest)
    {
        if (pattern.length())
            pattern << '|' << "^" << rest;
        else
            pattern << "^" << rest;
    }
}



Jamie Anstice
Search Scientist
S.L.I. Systems
[EMAIL PROTECTED]
ph:  64 961 3262
mobile: 64 21 264 9347

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to