According to Eric Bliss:
> Htdig has been acting well for us for some time now, but there is one glitch that 
>has been brought to my attention.
> 
> We have a number of websites which are updated on a regular basis.  Because of this, 
>old pages are being unlinked every week from
> the main body of the site.  To keep these pages in the search engine database (as 
>opposed to being lost forever), I've created a
> page for each website that just consists of the URLs of each of these pages.  At the 
>top of these pages, I place the meta tags to
> tell htdig to follow the links, but not index the page <META NAME="ROBOTS" 
>CONTENT="NOINDEX">.  I use these pages as the base
> documents for htdig to crawl from.
> 
> My problem is that although htdig's website says that it follows the robot rules, my 
>index documents still show up when a search is
> done.  Is there a different tag I should be using, or do you need to specify a 
>setting in htdig for it to obey robot rules?


There's a subtle bug in 3.1.5 and earlier versions.  The content parameter
of the meta robots tag should be case-insensitive, but htdig was expecting
lower-case.  You can either change the tag, or apply this patch to fix the
code:

   ftp://ftp.ccsf.org/htdig-patches/3.1.5/robotsCaseI.0

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives:  <http://www.htdig.org/mail/menu.html>
FAQ:            <http://www.htdig.org/FAQ.html>

Reply via email to