----- Original Message ----- From: "Gilles Detillieux" <[EMAIL PROTECTED]> To: "Artem Sokovtcev" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Saturday, March 23, 2002 1:05 AM Subject: Re: [htdig] HTDIG & url_list
> According to Artem Sokovtcev: > > Please help me! > > I using HTDIG 3.1.5 > > You ought to upgrade to 3.1.6, not for the problem below, but for a > number of other reasons. See http://www.htdig.org/RELEASE.html I don't may upgrade to 3.1.6, because i from russian :((( My ISP inslall modify version 3.1.5 for correct work with rassian encodings - cp-1251 & koi8-r. I not may independently modify 3.1.6 for correct work with russian codepages. > > > Why in my URL LIST (url_list: /home/htdig/tmp/urls.txt) i see links: > > > > somedomain/somrfile.css > > mailto:[EMAIL PROTECTED] > > somedomain/somrfile.doc > > somedomain/somrfile.zip > > > > I no want have all this links with all extensions in my url_list: > > /home/htdig/tmp/urls.txt!! > > I want have only *.html, *.shtml files! > > The purpose of the url_list is to be able to see all the links that htdig > sees in the documents it indexes, not to see the URLs of the documents > that are indexed. Just because a URL is in url_list, doesn't mean it > got indexed. > ok. i understand. > > How i may disabled files with unnecessary extensions & mailto: links? > > > > This is part of my htdig.conf: > > ********************************************************** > > exclude_urls: /cgi-bin/ .cgi .pl .css .ssi mailto: footerssi.shtml > > headerssi.shtml > > > > bad_extensions: .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \ > > .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi\ > > .js .pl .doc .css .mp3 .conf .db .aff .cfg .log .pid .ssi > > ********************************************************** > > > > Why this directives from htdig.conf do not take necessary effect??? > > They do have an effect on what htdig indexes, but they don't prevent > htdig from seeing other links in documents and reporting them. > > If you want to get a list of only the URLs that htdig indexes, run htdig > with -v, and collect the standard output in a file. You can then use > something like sed on that file to keep only the URLs and strip out the > other information. Very nice option! > > -- > Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil > Dept. Physiology, U. of Manitoba Phone: (204)789-3766 > Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 > _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

