> I'm going to try to answer both questions at the same time. The code itself
> doesn't put any real limit on the number of pages. Right now, I know of
> several sites in the hundreds of thousands of pages. I haven't heard bitter
> complaints, so I assume they're fairly satisfied with performance. That's
> not to say we're not working on improving memory requirements and
> performance as much as possible. :-)
>
> As for practical limits, I would say it depends a lot on how many pages you
> plan on indexing. Some OS's limit files to 2GB in size, which can become a
> problem with large DB. There are also slightly different limits to each of
> the programs. Right now htmerge performs a sort on the words indexed. Most
> sort programs use a fair amount of RAM and temporary disk space as they
> assemble the sorted list. The htdig program stores a fair amount of
> information about the URLs it visits, in part to only index a page once.
> This takes a fair amount of RAM.
>
> It's not a great answer, but with cheap RAM, it never hurts to throw more
> memory at indexing larger sites. In a pinch, swap will work, but it
> obviously really slows things down.
Is there some kind of debugging flag or so to see what limit really causes
the problem? I had the that the memory+swap was not really filled up,
that is, htdig already stops much earlier.
In the archives I found some references to a memory leak bug in some older
release. Can I assume that this bug has been fixed in 3.1.1 (which I'm
currently running)?
--
-- Jos Vos <[EMAIL PROTECTED]>
-- X/OS Experts in Open Systems BV | Phone: +31 20 6938364
-- Amsterdam, The Netherlands | Fax: +31 20 6948204
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.