According to Artem Sokovtcev:
> Please help me!
> I using HTDIG 3.1.5

You ought to upgrade to 3.1.6, not for the problem below, but for a
number of other reasons.  See http://www.htdig.org/RELEASE.html

> Why in my URL LIST (url_list: /home/htdig/tmp/urls.txt) i see links:
> 
> somedomain/somrfile.css
> mailto:[EMAIL PROTECTED]
> somedomain/somrfile.doc
> somedomain/somrfile.zip
> 
> I no want have all this links with all extensions in my url_list:
> /home/htdig/tmp/urls.txt!!
> I want have only *.html, *.shtml files!

The purpose of the url_list is to be able to see all the links that htdig
sees in the documents it indexes, not to see the URLs of the documents
that are indexed.  Just because a URL is in url_list, doesn't mean it
got indexed.

> How i may disabled files with unnecessary extensions & mailto: links?
> 
> This is part of my htdig.conf:
> **********************************************************
> exclude_urls:  /cgi-bin/ .cgi .pl .css .ssi mailto: footerssi.shtml
> headerssi.shtml
> 
> bad_extensions:  .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \
>   .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi\
>   .js .pl .doc .css .mp3 .conf .db .aff .cfg .log .pid .ssi
> **********************************************************
> 
> Why this directives from htdig.conf do not take necessary effect???

They do have an effect on what htdig indexes, but they don't prevent
htdig from seeing other links in documents and reporting them.

If you want to get a list of only the URLs that htdig indexes, run htdig
with -v, and collect the standard output in a file.  You can then use
something like sed on that file to keep only the URLs and strip out the
other information.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to