----- Original Message -----
From: "Gilles Detillieux" <[EMAIL PROTECTED]>
To: "Artem Sokovtcev" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Saturday, March 23, 2002 1:05 AM
Subject: Re: [htdig] HTDIG & url_list


> According to Artem Sokovtcev:
> > Please help me!
> > I using HTDIG 3.1.5
>
> You ought to upgrade to 3.1.6, not for the problem below, but for a
> number of other reasons.  See http://www.htdig.org/RELEASE.html



I don't may upgrade to 3.1.6, because i from russian :(((

My ISP inslall modify version 3.1.5 for correct work with rassian
encodings - cp-1251 & koi8-r.
I not may independently modify 3.1.6 for correct work with russian
codepages.




>
> > Why in my URL LIST (url_list: /home/htdig/tmp/urls.txt) i see links:
> >
> > somedomain/somrfile.css
> > mailto:[EMAIL PROTECTED]
> > somedomain/somrfile.doc
> > somedomain/somrfile.zip
> >
> > I no want have all this links with all extensions in my url_list:
> > /home/htdig/tmp/urls.txt!!
> > I want have only *.html, *.shtml files!
>
> The purpose of the url_list is to be able to see all the links that htdig
> sees in the documents it indexes, not to see the URLs of the documents
> that are indexed.  Just because a URL is in url_list, doesn't mean it
> got indexed.
>

ok. i understand.


> > How i may disabled files with unnecessary extensions & mailto: links?
> >
> > This is part of my htdig.conf:
> > **********************************************************
> > exclude_urls:  /cgi-bin/ .cgi .pl .css .ssi mailto: footerssi.shtml
> > headerssi.shtml
> >
> > bad_extensions:  .wav .gz .z .sit .au .zip .tar .hqx .exe .com .gif \
> >   .jpg .jpeg .aiff .class .map .ram .tgz .bin .rpm .mpg .mov .avi\
> >   .js .pl .doc .css .mp3 .conf .db .aff .cfg .log .pid .ssi
> > **********************************************************
> >
> > Why this directives from htdig.conf do not take necessary effect???
>
> They do have an effect on what htdig indexes, but they don't prevent
> htdig from seeing other links in documents and reporting them.
>
> If you want to get a list of only the URLs that htdig indexes, run htdig
> with -v, and collect the standard output in a file.  You can then use
> something like sed on that file to keep only the URLs and strip out the
> other information.


Very nice option!

>
> --
> Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
> Spinal Cord Research Centre       WWW:
http://www.scrc.umanitoba.ca/~grdetil
> Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
> Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
>


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to