>Number: 1057 >Category: mod_dir >Synopsis: Web robots should be told not to index auto-generated index >pages >Confidential: no >Severity: non-critical >Priority: medium >Responsible: apache (Apache HTTP Project) >State: open >Class: change-request >Submitter-Id: apache >Arrival-Date: Tue Aug 26 08:10:01 1997 >Originator: [EMAIL PROTECTED] >Organization: apache >Release: 1.3a1 >Environment: Linux noxious.muscat.co.uk 2.0.18 #1 Tue Sep 10 10:15:48 EDT 1996 i586 >Description: A web robot rarely wants to add auto-generated pages to its database. But it can't reliably spot them. Apache could help a lot by marking such pages as not to be indexed by putting:
<META NAME=robots CONTENT=noindex> into the HTML <HEAD>...</HEAD> section. This still allows compliant robots to follow links on the page, which is probably what's wanted. See <URL:http://info.webcrawler.com/mak/projects/robots/exclusion.html#meta> for details of the protocol. >How-To-Repeat: Look at: http://www.altavista.digital.com/cgi-bin/query?pg=q&what=web&kl=XX&q=title%3A%22Index+of%22+%22parent+directory%22 which gives "about 274150" examples. >Fix: Here's a patch to 1.3a1 -- the change is actually to mod_autoindex, but that's not available in the picker on the bug report form. --- src/mod_autoindex.c Mon Jul 21 06:53:49 1997 +++ src.mod/mod_autoindex.c Tue Aug 26 11:43:28 1997 @@ -122,6 +122,9 @@ * This routine puts the standard HTML header at the top of the index page. * We include the DOCTYPE because we may be using features therefrom (i.e., * HEIGHT and WIDTH attributes on the icons if we're FancyIndexing). + * "<META NAME=robots CONTENT=noindex>" tells robots which support the protocol + * that they shouldn't index this page (but that they can follow links). + * See <URL:http://info.webcrawler.com/mak/projects/robots/exclusion.html#meta> */ static void emit_preamble(request_rec *r, char *title) { @@ -131,7 +134,7 @@ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\n", "<HTML>\n <HEAD>\n <TITLE>Index of ", title, - "</TITLE>\n </HEAD>\n <BODY>\n", + "</TITLE>\n <META NAME=robots CONTENT=noindex>\n </HEAD>\n <BODY>\n", NULL ); } %0 >Audit-Trail: >Unformatted:
