At the beginning of June we noticed, that our index is getting too big. It was usually 300-400mb. Now its swelled ower 3gb and we don't know why. The size of all indexed files (html, pdf, ps, txt) is about 2.3gb. The files we're indexing are not changed, and htdig don't hang-up while indexing. It's even take 8-9 hours longer!!
Are you sure that there were no changes to any of the pages and no changes whatsoever to the directory structure? It is possible for a symbolic link or poorly formed hyperlink in a document to cause htdig to loop through a lot of bogus URLs, indexing some of the same documents over and over again. Simply adding a single link to a document also has the potential to pull in arbitrarily large portions of a site that were not previously indexed.
Are you certain that the start_url and limit_urls_to attributes have not changed in any way? Changes to either could allow more sites/directories to be indexed.
Are you reindexing from scratch, or performing updates? If the latter, it is possible that some sort of database corruption could be causing problems.
If you are indexing from scratch and can't think of anything else that has changed, you probably need to log the output of the dig and analyze it in order to determine where the problem might lie. If you are not already doing so, try running with the -s option to see if the number of indexed pages seems reasonable. You can also add one or more -v options in order to increase the verbosity of the output.
Jim
------------------------------------------------------- This SF.Net email is sponsored by: INetU Attention Web Developers & Consultants: Become An INetU Hosting Partner. Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

