On Tue, 21 Sep 2004, Aaron wrote:

How long should the initial database build take (ball park)? The machine is a Dual G4 1.2Ghz, and we are talking over 500,000 documents. Is this an hour? two? twelve? days? After I sent the email last night, I started a fresh rundig and included all of the documents, and it's still running htdig, it hasn't even gotten to htnotify, so it's been running about 8 hours. Is this to be expected?

I don't find it surprising for that many documents. However providing even a ball park figure for the total indexing time is difficult due to the number of factors involved. Document size, network performance, and htdig configuration settings can all have a major impact on the time required. The amount of RAM can also make a huge difference if there is not enough
to avoid swapping.


In the future you might try supplying a -v which will give you a little
feedback regarding progress. The only other thing I can think of to
suggest is that you try running against some smaller, representative
samples of the full document collection. You might be able to extrapolate
something useful from that.

Also be aware that there is a 2 GB limit on the size of some of the files
involved with index creation. Exceeding that limit will kill your indexing
run.

Jim


------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to