I've been crawling a rather large site, and I get:
529573:383196:16:http://xxx.xxx.xxx:80/programs/people: 
********------***-***/opt/pkg/htdig-3.1.6/bin/rundig: line 36: 11760 File size limit 
exceeded(core dumped) $BINDIR/htdig -i $opts $stats $alt
htmerge: Sorting...

That error happens at line 554533 or so in the log.

drwxr-xr-x    6 root     root         4096 Apr 14 05:02 ..
-rw-r--r--    1 root     root     1688109056 Apr 15 04:15 db.docdb
-rw-r--r--    1 root     root     53672960 Apr 15 04:15 db.docs.index
-rw-r--r--    1 root     root     1942763113 Apr 15 03:25 db.wordlist
-rw-r--r--    1 root     root     1463583744 Apr 15 03:25 db.words.db

The machine has 1 GB ram (I could move that to 2), and 3GB of swap.

Is it the old 2048 MB limit on filesize?  Linux supports longs
for fseek in the like, is htdig limited to fseeking with int's?

Is this a berkeley db limitation?

I'm using version 3.1.6, starting it with:
time /opt/pkg/htdig-3.1.6/bin/rundig -s -v &> bill-dig.log

Does Htdig 3.2.X have the same limitation?

Are there any guidelines (I checked the faq) for the number of url's 
that are reasonable to index?

BTW for what it indexes htdig seems to work very well, it's returning much
better results then the existing search engine.

-- 
Bill Broadley
Mathematics/Institute of Theoretical Dynamics
UC Davis

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to