At 10:35 AM 11/2/2001 -0500, Geoff wrote:
> > Is there a way I can index the contents of this file, but not the file 
> itself?
> >
> > I tried adding
> > exclude_urls: /adlist/index.html
>
>Keep in mind if you index a file and it still exists, it will be left in
>the database.
>
>I think what you want is the "follow, noindex" directive:
><META name="robots" content="follow, noindex">
>
>So links on this page will be followed by robots (including ht://Dig) but
>the page will not be indexed. In actuality, the page will be indexed
>somewhat but marked to be removed by htmerge/htpurge.

There's one more complication - some of the ads are free, some are paid - 
the site owner doesn't want the free ones to be picked up by outside search 
engines, because they are of a short duration and may be deleted but still 
show up in outside search engines.

To get around this, I modified HTML.cc to ignore the follow/noindex meta 
tag.  So what I really need is a config option to ignore or honor this tag 
in specific files.

Maybe this is too specific a case for a general config item.  Perhaps a new 
meta tag to follow/noindex a document on internal searches that would be 
ignored by exteranl search bots?


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to