According to Torsten Neuer:
> According to Geoff Hutchison:
> >On Fri, 11 Jun 1999, Luc Schiltz wrote:
> >> I'm indexing all *.lu sites but I've seen that htdigv.3.1.2 indexes *.lu.se too
> >
> >So when you define limits for limit_urls_to it matches anywhere in the
> >URL. So limiting to .lu -> .lu.se as well.
> >
> >> is there any way to exlude those *.lu.se sites ?
> >
> >exclude_urls: .lu.se
> 
> I think he used some http://foo.bar.lu as a start_url.
> Changing this to http://foo.bar.lu/ (with trailing slash) will
> most probably cause limit_urls_to to work as expected.

That's a good start, but he set limit_urls_to to simply ".lu", which will
still match .lu.se regardless of what start_url is set to.  However, with
the trailing slash in place on start_url, he could then set limit_urls_to
to ".lu/", which will not match ".lu.se".  Setting exclude_urls is a good
approach too, if there are only a few known exceptions that you want to
get rid of.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to