Re: [htdig3-dev] Speed of indexing with latest snapshot

Didier Gautheron Thu, 28 Oct 1999 10:38:09 -0700

Hi,
Geoff Hutchison wrote:
> 
> At 12:26 PM +0800 10/28/99, Toivo Pedaste wrote:
> >I've been running htdig over our campus web and it is up
> >to about 40000 pages and it is spending the great majority
> >of its times seeking on db.words.db

> 
> I guess I'm not totally surprised. So far, I don't think we've had
> many performance reports on 3.2, mostly because we're still working
> on features and bug fixes.
> 
> >What size databases do people use successfully?
> >Is there a way of tuning the db routines to speed
> >things up?
> 
> I don't know if anyone has run 3.2 on a substantial database--I've
> done at most 500-1000 pages. Running something like gprof would help
> us figure out where the time is spent.

I didn't run the latest snapshot but some previous one (I gave up) in
one of the worst case: 3000 hosts on a 100 Mb  ethernet, big pipe, no
latency, just parsing and saving .   
I'm afraid gprof wouldn't help, what will it say when the load is going
down from 99 % to below 1% .
IMO the htdig in one pass is a bad good idea (at least now and for
initial dig).
I didn't really read the db code. but for me it's just plain cache
trashing, as long as db size fits in RAM it's ok but after... 
There's no chance that 2 != words will be in the same leafnode.
With the one doc per host policy, htdig is pushing unrelated docs so
fewer caches hit.
A page miss cost is important , record are very small .
And so on.

Didier

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.
Re: [htdig3-dev] Speed of indexing with latest snapshot

Reply via email to