I've been crawling a rather large site, and I get: 529573:383196:16:http://xxx.xxx.xxx:80/programs/people: ********------***-***/opt/pkg/htdig-3.1.6/bin/rundig: line 36: 11760 File size limit exceeded(core dumped) $BINDIR/htdig -i $opts $stats $alt htmerge: Sorting...
That error happens at line 554533 or so in the log. drwxr-xr-x 6 root root 4096 Apr 14 05:02 .. -rw-r--r-- 1 root root 1688109056 Apr 15 04:15 db.docdb -rw-r--r-- 1 root root 53672960 Apr 15 04:15 db.docs.index -rw-r--r-- 1 root root 1942763113 Apr 15 03:25 db.wordlist -rw-r--r-- 1 root root 1463583744 Apr 15 03:25 db.words.db The machine has 1 GB ram (I could move that to 2), and 3GB of swap. Is it the old 2048 MB limit on filesize? Linux supports longs for fseek in the like, is htdig limited to fseeking with int's? Is this a berkeley db limitation? I'm using version 3.1.6, starting it with: time /opt/pkg/htdig-3.1.6/bin/rundig -s -v &> bill-dig.log Does Htdig 3.2.X have the same limitation? Are there any guidelines (I checked the faq) for the number of url's that are reasonable to index? BTW for what it indexes htdig seems to work very well, it's returning much better results then the existing search engine. -- Bill Broadley Mathematics/Institute of Theoretical Dynamics UC Davis _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

