You'll notice another post I have here that has yet to be answered:

Subject: HELP - Disaster with index!!! How to recover???

It's been a week now without an answer, but I have to do something. Because I can't seem to get an answer on the above maybe I can get an answer on this one which will hopefully dump the above problem.

The problem with the above is NOT a database size limit. I have checked all table sizes and we are currently only using about 50% of allowed mysql table size limit.

I've written a Perl script to access urlword and fetch all URLs with a status of 200 and have written this list of URLs to a simple text file with one URL per line. This consists of 3,112,768 unique URLs.

Now because aspseek has been corrupted for some unknown reason and the "index -H" simply aborts, I obviously have to start the entire index over again and keep my fingers crossed. So my question is, can I insert this HUGE file of URLs using:

./index -i -f ./myurls.txt

and expect all 3,112,768 to be inserted? I'm sure it will, but the big question is that when I run index to fetch these documents:

./index -N 80 -R 64

will index handle all this? Will index eat up all the available memory (2GB) trying to load all these URLs in memory? I've had problems with aspseek eating up all memory and eventually thrashing the disk cache with as few as inserting 250,000 URLs. No problems running search, but index is a memory hog. I understand in the aspseek.conf file there is this directive:

NextDocLimit 1000

which is the default, but I don't know if that has anything to do with this or not. What I can say is I have found other directives like MaxBandwidth does NOT work as stated (reported bug #26) so I'm afraid that if this NextDocLimit does what I think it does and it has bugs too, I may be wasting my time on this whole project.

If anyone has any suggestions on an alternative indexing and search program let me know. We are using aspseek with our intranet and not as a public search, but fetching the entire document and providing a "cached" version is what is important to us.

Thanks,
Karen

_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963

Reply via email to