Dear all,

after many many unsuccessful attempts I really hope that the htdig
community can help me. My problem is as follows:
I have a quite large server with more than 100000 PDFs on it. For
indexing I create an HTML file with links to all PDFs and use this file
as start_url. But now it seems that I have found a magical 2GByte limit,
because indexing (a htdig run) stops as soon as db.docdb reaches a size
of 2147483647 (2^31 - 1) bytes. I can see in the log-files (htdig -vv)
that htdig simply stops and does not process the remaining PDFs.

Unsuccessful attempts have been so far:
- installation of htdig 3.1.5/3.1.6 (self compiled, i.e. no package)
- db-directory on a ext2/ext3/reiser partition
- kernel 2.4.10 (Suse 7.3)
- kernel 2.4.21 (Suse 9)

I've read in
http://www.geocrawler.com/mail/msg.php3?msg_id=9056546&list=8822 that
Reiser-FS could be an option, but it didn't work for me. Besides I can
easily create files bigger than 2 GByte already on a ext2 partition (I
really checked that with a shell script). Htdig 3.2.0b5 is not really an
option since diging is by a factor of ten slower than 3.1.6 (which would
mean full ten days of indexing) despite of possible optimisations
described in the FAQ.

I know that file sizes could be a matter of architecture (I run an x86
one), but also of the kernel (older kernels have had this 2 GByte limit,
but I have a brandnew one?!?).

What makes me wonder is that the author of the link above could overcome
his problems with a simple change of his file system, but I can't...

Any help is really appreciated.

Thanks,

Anton


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to