After searching the various indexing tools on SourceForge, I have
selected Htdig 3.1.6 for a trial run using FreeBSD 4.9. I've installed
the 3.1.6 port. 

<background>
My plans are to build a search DB for a number of supercomputer sites as
part of my web site info on supercomputers (and other subjects). The
typical supercomputer site is quite large.

I'm currently running 3.1.6 in an old machine as a test, with a list of
ten sites, no outside links. As of this writing, htdig has about 10.5
hours of cpu time, a size of 126 MB and a res of 83MB, per the top
command. The DB of docs (10K max) is 190 MB and wordlist 183 MB. 

It has become quite clear (duh) that the system  is badly I/O limited,
but that's what this test was all about. I'm running on an obsolete
Cyrix 300 with 128 MB of ram, plus 2 x 2GB F&W SCSI disks.  That it is
still running speaks well of your design and code. 

Unfortunately, I failed to use -v in the run, so I have no idea how much
longer this will run. Since it is now swapping, that will slow things
more. Elapsed time is close to 24 hours. 
</background>

I plan to install two more drives and use them as follows:
        * One 2G for the wordlist
        * One 4G for the doc db.
A current 2G drive will be used for temp sort space, about 1.5 G
available. That's where the whole DB is now, and it is very busy. Any
fuzzy db will go on this drive so I have the DB spread across three
drives on two channels..

A faster system with more memory will be used when the $$ become
available. 
Most of the parts are already on hand, I need only a new ATX box with
room for several drives.. 

My next step is to get 3.2b5 running on FreeBSD, then test with the
added disks plus many config enhancements. I'll run the same 10 URLs
again for comparison, but since I plan to capture much more of the
documents (30K) and may add PDF and Word translation, the DB size and
processor time will be much longer.  

Request 1:
I'd like feedback from anyone who has run a large DB, say 500MB or
bigger, and any warnings or advice on setup, configuration, etc. 

Request 2:
Please wrap up 3.2 final so I can start using it in production. Please
make the concurrent search and index capability your top priority for
vers 3.3.

Comment on Htdig:
I'd like to compliment the folks that have built this system. There are
simply a lot of very nice things about the design, and you get an A on
documentation. The compress and other features in 3.2 will be essential
for my needs. I think my plan for large DB using Htdig is likely to
become more common as folks find this nicely designed software. 

Thanks,
Bill Nicholls
http://www.billswrite.com
begin:vcard 
n:Nicholls;Bill 
x-mozilla-html:FALSE
url:http://www.billswrite.com
adr:;;;;;;
version:2.1
email;internet:[EMAIL PROTECTED]
x-mozilla-cpt:;-19536
fn:Bill Nicholls
end:vcard

Reply via email to