|
Today at work, I went ahead and finished writing my
new program to index the directories separately and merge them together (as
opposed to indexing them as a single entity).
It used to take 50-70 minutes to index everything
from scratch. So far, it's been running for two hours (digging the 9 directories
took 45 minutes), so htmerge has been at it for almost an hour and fifteen
minutes.
It looks like indexing them as separate directories
and merging them together wasn't such a great idea after all.
One thought I'm having is that maybe it would be
better if I changed the merge order to take the relative database sizes into
account. Is htmerge most efficient at merging small databases into big
databases, big databasesinto small databases, or databases that are
approximately equal in size?
If it helps, here is the number of seconds each
database took to create using htdig:
1 : 106
2 : 0
3 : 1405
4 : 307
5 : 443
6 : 153
7 : 192
8 : 75
9 : 2
alldb.* are the new database into which the other
databases are being merged
db.* is the database that I created as a single
entity prior to trying out the new program.
The current run of htmerge is merging 1 into all, 2
into all, 3 into all, 4 into all, 5 into all, etc. I'm not sure how far along it
is, but here is the current size of the files:
-rw-r--r-- 1 nobody
nobody 164291584 Feb 2 16:56
alldb.docdb
-rw-r--r-- 1 nobody nobody 4179968 Feb 2 16:56 alldb.docs.index -rw-r--r-- 1 nobody nobody 222506467 Feb 2 16:58 alldb.wordlist -rw-r--r-- 1 nobody nobody 0 Feb 2 16:58 alldb.wordlist.new -rw-r--r-- 1 nobody nobody 2048 Feb 2 16:58 alldb.words.db -rw-r--r-- 1 nobody nobody 219489280 Feb 2 13:54 db.docdb -rw-r--r-- 1 nobody nobody 5030912 Feb 2 13:54 db.docs.index -rw-r--r-- 1 nobody nobody 324586385 Feb 2 13:49 db.wordlist -rw-r--r-- 1 nobody nobody 264422400 Feb 2 13:49 db.words.db The run began at 15:05, and it's currently
17:11
Any suggestions as to what I should try doing
next?
-- Jeff
|

