Hi, I looked in the archives for my problem but couldn't find a hint.

The problem is:
We use htdig to index 100+ servers at our organisation since a couple of
weeks now. These servers are a mix of Unix and IIS (the latter being not
case sensitive). So from the beginning I set case_sensitive to true,
thinking that both server-types would be indexed ok.
However, it seems that a lot of IIS-servers have quite large databases
with (in terms of upper/lowercase) all kind of URL's pointing to it,
resulting in an enormous grow of pages read form that servers, like:

../books/abstracts/xyz
../Books/abstracts/xyz
../books/Abstracts/xyz
etc.

The number of visits grow from about 5000 to ~25000.
The solution would be case_sensitive=false, however then the
Unix-servers will be incomplete.

Perhaps I don't understand the entire stuff with 'case_sensitive' but
would it not be a solution if one could set case_sensitive=false, and every
website was visited with a non-converted URL (as found in a document)
and only the URL-comparison to prevend multiple access for the same page
was done in lowercase? 
Of course this could result in a loss of information on the Unix-server,
but as far as I can see here, this would be none. 
I don't see the need to convert the URL to lowercase BEFORE fetching the
page when 'case_sensitive=false'.

Regards, Kees Bol
 
-- ==================================================
Mailto:[EMAIL PROTECTED] (!! bol@a,x,p,one,...)
Wageningen UR,  Dept. FB-ICT
Dreijenplein 2, 6703 HB  Wageningen, Netherlands 
Phone:+31(0)317-484715  Fax:+31(0)317-485360   
http://www.wau.nl
==================================================


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to