I've recently built and installed htdig-3.2.0b3 and it is
working pretty well.  However, it is indexing more docs
than I would like.  I can't get into ftp.htdig.org right
now to search the contrib stuff (server not responding),
and I wonder if there is any script there to list in text
form the documents that have been indexed from, say, the
db.docs.index file?

Also, is my assumption correct that if a document is
excluded via exclude_urls then no links in it are followed
for indexing, even if those links wouldn't by themselves
be excluded?

Finally, what would be the best way to avoid having
"equivalent" documents indexed multiple times when they
are referenced by slightly different URLs, such as:

http://websource.wrlc.org:8000/voyager/stgfac/
http://websource.wrlc.org:8000/voyager/stgfac/index.html
http://websource.wrlc.org:8000/voyager/stgfac/?N=D
http://websource.wrlc.org:8000/voyager/stgfac/?D=A

and so on?

Thanks for any advice that you can provide,

-Don


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to