On Wed, 23 Feb 2000, Geoff Hutchison wrote:

> Date: Wed, 23 Feb 2000 08:31:22 -0600
> From: Geoff Hutchison <[EMAIL PROTECTED]>
> To: J Kinsley <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED]
> Subject: Re: [htdig] ht://Dig 3.2.0b1 and 3.2.0b2-022000 Extremely Slooooow

Ok, I shall attempt to provide some hard numbers here showing the
index speed difference between 3.2.0b2-022000 and 3.1.2.  First
though I will clear up the rpm confusion.  When I installed the beta
series, I used RPM to build and install it.  However, last spring
when I first installed 3.1.2, I did not use RPM and the binaries went
into /opt/www/bin.  When installing the beta rpm, I moved the binary
location to /opt/www/sbin and since 3.1.2 was manually installed, RPM
did not remove those binaries.  The first time I built the index two
days ago, I called htdig from the command line and the 3.1.2 binaries
were used instead of the betas.  I did not realize this until trying
to determine why htsearch (3.1.2 version was overwritten by beta
version) failed to recognize the database.  Although I had previously
installed ht://Dig, I had never used it due to disk space
limitations.

Anyway, on with the numbers....

Server: 
        Intel PII 233MHz 
        64MB SDRAM 
        Kernel 2.2.14 
        Customized RedHat 6.0-6.2
        Apache 1.3.6

Archive: 
        44101 Files - 1290 Directories
                Smallest:  190 B
                Largest:  9.40 MB
                Average: 30.20 KB
                Total:    1.35 GB

NOTE: ht://Dig is running on the same physical host as the web server
it indexing, so network bandwidth is not a factor here.


ht://Dig version: 3.1.2

        htdig -l -s -v -c /etc/www/htdig/bti.conf > /tmp/htdig.log 2>&3

        Index time:  01:52:00
        Index size:  634MB wordlist
                     325MB documents
        URL's indexed according to /tmp/htdig.log:  52100
        (number higher than total due to indexing ?[MNSD]=[AD] for
        each directory

        CPU time:    00:39:00
        RSS:         unknown

        htmerge -c /etc/www/htdig/bti.conf
        Merge time:  00:42:00
        Index size:  504MB wordlist.db
        CPU time:    unknown
        RSS:         unknown

        Note:  the above numbers are from my memory and thus are
        close approximations.



ht://Dig version: 3.2.0b1

        htdig -l -s -v -c /etc/www/htdig/bti.conf > /tmp/htdig.log 2>&3

        Exited after 3 hours / ~2200 files to attempt to speed up



ht://Dig version: 3.2.0b2-022000

        htdig -l -s -v -c /etc/www/htdig/bti.conf > /tmp/htdig.log 2>&3


        Index time:
                Start:    Feb 23 05:07:18 EST 2000
                Current:  Feb 23 16:38:25 EST 2000
                Est. End: Feb 24 10:00:00 EST 2000
        URL's processed according to /tmp/htdig.log:  19111
        CPU time:  00:52:42
        RSS:       31MB



<snip>

> Now, as far as the speed of indexing in 3.2.0b1 (and current 
> snapshots), I probably need to make this a FAQ. Right now, it's 
> probably not going to be faster than 3.1.x versions and is quite 
> likely to be slow. We rewrote the whole layout of databases and in 
> the process made quite a few trade-offs against the indexer.

Using my estimated end time above, we're looking at a 27 hour
increase in index time on ~50,000 URL's.  I do not think this is you
mean by 'a few trade-offs', so I am guessing it is a bug.  Although I
do not fully understand how to detect memory leaks, I suspect that is
the problem.  When I first start htdig, it indexes the first 1000
URL's in about 6 minutes and the RSS creeps up to around 18-19MB and
it starts to slow down.

<snip>

> But the important thing to remember is that these are *betas*--we're 
> looking for feedback. We'd love to have accurate performance and 
> requirement feedback. The new database layout is probably going to 
> require more disk space (especially if compression is off), but you 
> won't need as much memory for htmerge. So hard numbers would be 
> wonderful. This will help us target what needs improvement. Further, 
> if anyone wants to help improve indexing performance, I'm sure we can 
> come up with a list.


Ht:/Dig is just one of many bleeding edge packages I currently
have installed, so I'll do what I can to help solve the problems.


J. Kinsley


------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to