Today at work, I went ahead and finished writing my new program to index the directories separately and merge them together (as opposed to indexing them as a single entity).
 
It used to take 50-70 minutes to index everything from scratch. So far, it's been running for two hours (digging the 9 directories took 45 minutes), so htmerge has been at it for almost an hour and fifteen minutes.
 
It looks like indexing them as separate directories and merging them together wasn't such a great idea after all.
 
One thought I'm having is that maybe it would be better if I changed the merge order to take the relative database sizes into account. Is htmerge most efficient at merging small databases into big databases, big databasesinto small databases, or databases that are approximately equal in size?
 
If it helps, here is the number of seconds each database took to create using htdig:
 
1 : 106
2 : 0
3 : 1405
4 : 307
5 : 443
6 : 153
7 : 192
8 : 75
9 : 2
 
alldb.* are the new database into which the other databases are being merged
db.* is the database that I created as a single entity prior to trying out the new program.
 
The current run of htmerge is merging 1 into all, 2 into all, 3 into all, 4 into all, 5 into all, etc. I'm not sure how far along it is, but here is the current size of the files:
-rw-r--r--   1 nobody   nobody   164291584 Feb  2 16:56 alldb.docdb
-rw-r--r--   1 nobody   nobody    4179968 Feb  2 16:56 alldb.docs.index
-rw-r--r--   1 nobody   nobody   222506467 Feb  2 16:58 alldb.wordlist
-rw-r--r--   1 nobody   nobody          0 Feb  2 16:58 alldb.wordlist.new
-rw-r--r--   1 nobody   nobody       2048 Feb  2 16:58 alldb.words.db
-rw-r--r--   1 nobody   nobody   219489280 Feb  2 13:54 db.docdb
-rw-r--r--   1 nobody   nobody    5030912 Feb  2 13:54 db.docs.index
-rw-r--r--   1 nobody   nobody   324586385 Feb  2 13:49 db.wordlist
-rw-r--r--   1 nobody   nobody   264422400 Feb  2 13:49 db.words.db
 
The run began at 15:05, and it's currently 17:11
 
Any suggestions as to what I should try doing next?
 
-- Jeff

Reply via email to