On 2 Jan 2002 at 14:35, Geoff Hutchison wrote: > On Tue, 1 Jan 2002, Dan Langille wrote: > > > I've found an instance where a document contains in robots.txt is > > included in the final index. Not sure if this is a bug or a feature. > > I believe I know what you're talking about. > > At the moment, if a document is already in the database and the robots.txt > forbids it, then it will be left in the database. This could happen if you > only update old databases.
The above scenario isn't what I'm testing with. But it's close. I start with no database. Ensure robots.txt excludes ottawa-pics.php. Index entire site. Searched for 'Ikea', which only occurs in that excluded document. Not found. Set start.url = ottawa-pics.php. Leave robots.txt unchanged. Ran merge. Search for ikea. Found it. I've just rerun the above test to verify. Anything I can provide which can help? -- Dan Langille The FreeBSD Diary - http://freebsddiary.org/ - practical examples _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

