According to Artem Sokovtcev: > Please help me! > I using HTDIG 3.1.5
You ought to upgrade to 3.1.6, not for the problem below, but for a number of other reasons. See http://www.htdig.org/RELEASE.html > Why in my URL LIST (url_list: /home/htdig/tmp/urls.txt) i see links: > > somedomain/somrfile.css > mailto:[EMAIL PROTECTED] > somedomain/somrfile.doc > somedomain/somrfile.zip > > I no want have all this links with all extensions in my url_list: > /home/htdig/tmp/urls.txt!! > I want have only *.html, *.shtml files! The purpose of the url_list is to be able to see all the links that htdig sees in the documents it indexes, not to see the URLs of the documents that are indexed. Just because a URL is in url_list, doesn't mean it got indexed. > How i may disabled files with unnecessary extensions & mailto: links? > > This is part of my htdig.conf: > ********************************************************** > exclude_urls: /cgi-bin/ .cgi .pl .css .ssi mailto: footerssi.shtml > headerssi.shtml > > bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \ > .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi\ > .js .pl .doc .css .mp3 .conf .db .aff .cfg .log .pid .ssi > ********************************************************** > > Why this directives from htdig.conf do not take necessary effect??? They do have an effect on what htdig indexes, but they don't prevent htdig from seeing other links in documents and reporting them. If you want to get a list of only the URLs that htdig indexes, run htdig with -v, and collect the standard output in a file. You can then use something like sed on that file to keep only the URLs and strip out the other information. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

