Greetings Chris,

Wow!  I hadn't realised it was *that* much slower.  That is serious!

I'm not sure if any of these are applicable, but some possibilities 
are:

- try different combinations of "head_before_get" and
  "persistent_connections" (and "max_connection_requests")
- change  max_retries,  tcp_max_retries  and  tcp_wait_time
- change  timeout
- change  md5  settings (check_unique_md5, chec_unique_date)
- change  the compression options.  If CPU-bound, removing compression
  should help.  If disk bound, adding it should help.
- if no URLs are "local" (non-http), ensure  local_urls  is empty
- play with server_wait_time
- Reduce the data you're collecting by
  o setting  doc_list  and/or  word_dump  to empty
  o setting  ignore_alt_text  to true
  o reducing  max_descriptions  and  max_description_length
  o reducing  max_doc_size
  o reducing  max_head_length
  o reducing  max_keywords
  o reducing  max_meta_description_length
  o adding more  bad_words

A lot of these should make no difference, as they haven't changed 
since 3.1.6.  However, if changing them *does* make a difference, it 
may shed light on a possible bug.

It would also be very helpful if you could gather some information 
about what is taking the time.  Is the CPU usage high?  Is the disk 
usage high?  Is the network traffic higher than under 3.1.6?  Is the 
indexing of local documents slowed down, or just http documents?

Thanks for the feedback,
Lachlan

On Sat, 31 Jan 2004 06:34, Christopher Murtagh wrote:
>  Well, as I'm finishing up our new search tool, I just did my first
> index over http today (the majority of what I was working on
> involved indexing small local files). I was surprised at how slow
> the spidering/indexing really was.
>
>  It has taken about 11 hours to index 10k pages so far. In my last
> dig under 3.1.6, I did 30k+ pages in 1 hour and 41 minutes!
> ...'wordlist_cache_size: 100000000'...
>
> Any quick tips/optimizations that anyone can think I
> might try before I continue?

-- 
[EMAIL PROTECTED]
ht://Dig developer DownUnder  (http://www.htdig.org)


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to