----- Original Message ----- From: "Tim Bray" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, March 09, 2000 3:30 AM Subject: Re: What happens once robots are barred?
> At 11:59 AM 3/8/00 -0800, Mark Bennett wrote: > > >* It should also keep track of "orphan" pages - pages that are still > >accessible via the direct URL, but are no longer linked-to by other pages on > >the site. > > > >I believe all 3 classes of pages should be removed from the index. > > > >The third item is an interesting one. I know some spiders do NOT realize > >that pages are no longer "linked to" and keep indexing them. > > When you're indexing a web *site* (i.e. you don't care about anything > outside the web site), this is sensible. When you're trying to a large scale > index of the whole web, it gets more complex. If a site has ever been > announced to the outside world, the assumption is that it may have been linked > to from elsewhere; the publishing of a page *should* represent a commitmenet > on the part of the publisher to maintain it. If the page needs to be removed, > merely removing links to it is violently unsatisfactory since there is no way > an incoming link from outside can know that it's now an orphan. So such pages > are a live part of the web until removed. -T. Yes, so my question is "is it possible to remove a page from a search engines view of the web, without removing the resource?" To give an example, a company's annual report is published (http://www.acme.com/annual-report-1999/) and submitted to several search engines. The following year the report is unlinked from the main area, but linked from an archive area. The company wishes to remove the report from search engines. Can this be done - in a more elegant fashion that going to every search engine and submitted a load of unsubmit requests? What effect does having a <meta name="robots" content="noindex"> element have for a resource which has already been indexed? Brian -------------------------------------------------------------------- Brian Kelly, UK Web Focus UKOLN, University of Bath, BATH, England, BA2 7AY Email: [EMAIL PROTECTED] URL: http://www.ukoln.ac.uk/ Homepage: http://www.ukoln.ac.uk/ukoln/staff/b.kelly.html Phone: 01225 323943 FAX: 01225 826838