At 10:35 AM 11/2/2001 -0500, Geoff wrote: > > Is there a way I can index the contents of this file, but not the file > itself? > > > > I tried adding > > exclude_urls: /adlist/index.html > >Keep in mind if you index a file and it still exists, it will be left in >the database. > >I think what you want is the "follow, noindex" directive: ><META name="robots" content="follow, noindex"> > >So links on this page will be followed by robots (including ht://Dig) but >the page will not be indexed. In actuality, the page will be indexed >somewhat but marked to be removed by htmerge/htpurge.
There's one more complication - some of the ads are free, some are paid - the site owner doesn't want the free ones to be picked up by outside search engines, because they are of a short duration and may be deleted but still show up in outside search engines. To get around this, I modified HTML.cc to ignore the follow/noindex meta tag. So what I really need is a config option to ignore or honor this tag in specific files. Maybe this is too specific a case for a general config item. Perhaps a new meta tag to follow/noindex a document on internal searches that would be ignored by exteranl search bots? _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

