Hi, I looked in the archives for my problem but couldn't find a hint. The problem is: We use htdig to index 100+ servers at our organisation since a couple of weeks now. These servers are a mix of Unix and IIS (the latter being not case sensitive). So from the beginning I set case_sensitive to true, thinking that both server-types would be indexed ok. However, it seems that a lot of IIS-servers have quite large databases with (in terms of upper/lowercase) all kind of URL's pointing to it, resulting in an enormous grow of pages read form that servers, like:
../books/abstracts/xyz ../Books/abstracts/xyz ../books/Abstracts/xyz etc. The number of visits grow from about 5000 to ~25000. The solution would be case_sensitive=false, however then the Unix-servers will be incomplete. Perhaps I don't understand the entire stuff with 'case_sensitive' but would it not be a solution if one could set case_sensitive=false, and every website was visited with a non-converted URL (as found in a document) and only the URL-comparison to prevend multiple access for the same page was done in lowercase? Of course this could result in a loss of information on the Unix-server, but as far as I can see here, this would be none. I don't see the need to convert the URL to lowercase BEFORE fetching the page when 'case_sensitive=false'. Regards, Kees Bol -- ================================================== Mailto:[EMAIL PROTECTED] (!! bol@a,x,p,one,...) Wageningen UR, Dept. FB-ICT Dreijenplein 2, 6703 HB Wageningen, Netherlands Phone:+31(0)317-484715 Fax:+31(0)317-485360 http://www.wau.nl ================================================== _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

