According to karin kosina:
>     thank you so much - that was, indeed, the point that i had missed.
> 
> > It won't
> > find documents that aren't directly or indirectly referenced by <a
> href=...>
> > tags in your start_url document(s).  If you want to index all documents
> > on your site, whether linked or not, you'll need to produce a list of them
> > and use that as your start_url
> that list - does that have to be a list of <a href=...>filename</a> 's ?
> and if yes, how do i get that easily?

There are two ways of doing this, and only one of them requires hrefs as
above.

1) you could generate a file containing merely URLs (not hrefs), one per line,
for each of the documents you want indexed.  E.g.:

  find /home/httpd/html -type f -name \*.html -print | \
        sed 's|/home/httpd/html|http://www.mydomain.org|' \
        > /etc/htdig/urls_to_index

and then put this in your htdig.conf:

  start_url:    `/etc/htdig/urls_to_index`

2) alternatively, you could generate a proper HTML document that contains
the href=... for each and every document you want indexed, and then use
the URL of that generated file as your start URL, e.g.:

  start_url:    http://www.mydomain.org/data/linkstoindex.html
  limit_urls_to: http://www.mydomain.org/

In this second case, you need to override limit_urls_to, because it normally
will take the same value as start_url, but in my example that would be too
restrictive.

See http://www.htdig.org/attrs.html for a description of these and other
config file attributes, to get a better understanding of how they work.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to