Greetings Chris, Wow! I hadn't realised it was *that* much slower. That is serious!
I'm not sure if any of these are applicable, but some possibilities are: - try different combinations of "head_before_get" and "persistent_connections" (and "max_connection_requests") - change max_retries, tcp_max_retries and tcp_wait_time - change timeout - change md5 settings (check_unique_md5, chec_unique_date) - change the compression options. If CPU-bound, removing compression should help. If disk bound, adding it should help. - if no URLs are "local" (non-http), ensure local_urls is empty - play with server_wait_time - Reduce the data you're collecting by o setting doc_list and/or word_dump to empty o setting ignore_alt_text to true o reducing max_descriptions and max_description_length o reducing max_doc_size o reducing max_head_length o reducing max_keywords o reducing max_meta_description_length o adding more bad_words A lot of these should make no difference, as they haven't changed since 3.1.6. However, if changing them *does* make a difference, it may shed light on a possible bug. It would also be very helpful if you could gather some information about what is taking the time. Is the CPU usage high? Is the disk usage high? Is the network traffic higher than under 3.1.6? Is the indexing of local documents slowed down, or just http documents? Thanks for the feedback, Lachlan On Sat, 31 Jan 2004 06:34, Christopher Murtagh wrote: > Well, as I'm finishing up our new search tool, I just did my first > index over http today (the majority of what I was working on > involved indexing small local files). I was surprised at how slow > the spidering/indexing really was. > > It has taken about 11 hours to index 10k pages so far. In my last > dig under 3.1.6, I did 30k+ pages in 1 hour and 41 minutes! > ...'wordlist_cache_size: 100000000'... > > Any quick tips/optimizations that anyone can think I > might try before I continue? -- [EMAIL PROTECTED] ht://Dig developer DownUnder (http://www.htdig.org) ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev