Over the last few days, I have tried various configurations for getting htdig to index the huge amount of documents I have. The best results I have gotten so far, is to create multiple databases, with multiple config files and multiple databases. I have also created files with lists of links for $start_url, however, the start files vary in size.

It seems that htdig is sucking up lots of RAM and not giving it back while indexing. I wrote a little script that runs htdig in a for loop, to index all the files in each list one after the other. After a while, it seems it just runs out of memory. This is the results of a Dual G5 2Ghz Apple Xserve with 2Gb RAM:

*** malloc: vm_allocate(size=15523840) failed (error code=3)
*** malloc[1036]: error: Can't allocate region
/Volumes/ngs/app/listsp/htdig/bin/rundig: line 36: 1036 Abort trap $BINDIR/htdig -i $opts $stats $alt
htmerge: Unable to open word list file '/Volumes/ngs/app/listsp/htdig/db/script-users/db.wordlist'.
Did you index anything?
Check your config file and try running htdig again.


DB2 problem...: /Volumes/ngs/app/listsp/htdig/db/script-users/db.docdb: No such file or directory

This happened with only about 60,000 documents, and nothing else running on the box. Any suggestions? Do you think trying out one of the 3.2 betas might help?

Thanks!

-=Aaron



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to