On 2 Jan 2002 at 14:35, Geoff Hutchison wrote:

> On Tue, 1 Jan 2002, Dan Langille wrote:
> 
> > I've found an instance where a document contains in robots.txt is 
> > included in the final index.  Not sure if this is a bug or a feature.
> 
> I believe I know what you're talking about.
> 
> At the moment, if a document is already in the database and the robots.txt
> forbids it, then it will be left in the database. This could happen if you
> only update old databases.

The above scenario isn't what I'm testing with.  But it's close.

I start with no database.  Ensure robots.txt excludes ottawa-pics.php.  
Index entire site.  Searched for 'Ikea', which only occurs in that 
excluded document.  Not found.  Set start.url = ottawa-pics.php.  Leave 
robots.txt unchanged.  Ran merge.  Search for ikea.  Found it.

I've just rerun the above test to verify.  Anything I can provide which 
can help?
-- 
Dan Langille
The FreeBSD Diary - http://freebsddiary.org/ - practical examples


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to