Kord Campbell
Wed, 18 Sep 2002 10:50:12 -0700
How fast does it return a search result? We had managed to index about a million sites about a year and a half ago, and the search times were horrible.
Oh, BTW, we do a fair bit of crawling the Internet ourselves. I've always envisioned that aspseek could have a plugin to take data from us, but we figured that it couldn't handle the millions of URLs that we were crawling everyday. Kord On Wed, 18 Sep 2002, Yuriy Soroka wrote: > Yes, > > I have indexed 255 179 URLs > I was indexing by 20000 - 40000 URLs > > var dir size - 1.5 Gb > I can't say for certain size of mysql database. > > Hardware 2 CPU 1.1 GHz each, about 1.5 G of RAM > OS - FreeBSD 4.5 release p6 > > no special kernel/mysql tuning was done. > > > > > ----- Original Message ----- > From: "Gregory Kozlovsky" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Wednesday, September 18, 2002 7:05 PM > Subject: RE: [aseek-devel] How to index external list of URLs? > > > > This is interesting. Can you share with us the size of your database (in > > docs and in GB), > > details of your hardware, and tuning of the Linux kernel and the mysql > > server? > > > > Gregory Kozlovsky > > > > -----Original Message----- > > From: Yuriy Soroka [mailto:[EMAIL PROTECTED]] > > Sent: Mittwoch, 18. September 2002 02:43 > > To: [EMAIL PROTECTED] > > Subject: Re: [aseek-devel] How to index external list of URLs? > > > > > > Why don't you just include them to aspseek.conf > > > > I indexed 250 000 urls. > > > > Include myfile.txt > > > > > > ----- Original Message ----- > > From: "J and T" <[EMAIL PROTECTED]> > > To: <[EMAIL PROTECTED]> > > Sent: Wednesday, September 18, 2002 3:10 AM > > Subject: [aseek-devel] How to index external list of URLs? > > > > > > > How in the world do you index a list of URLs NOT in the aspseek.conf? I > > have > > > tried everything I can think of: > > > > > > ./index -i -f myfile.txt > > > ./index -N 100 > > > > > > Doesn't work. The myfile.txt lists 5,000 URLs like this: > > > > > > Server http://someserver.com/ > > > > > > But when I run the above (ie, ./index -i -f myfile.txt) > > > > > > I get the following error: > > > > > > Bad URL: Server http://someserver.com/ > > > > > > So I removed the "Server " so now it reads: > > > > > > http://someserver.com/ > > > > > > Did the same thing: > > > > > > ./index -i -f myfile.txt > > > > > > Now it shows them in the database: > > > > > > ./index -S > > > > > > ASPseek database statistics > > > > > > Status Expired Total > > > ----------------------------- > > > 0 5000 5000 Not indexed yet > > > ----------------------------- > > > Total 5000 5000 > > > > > > So now I try to run the indexer: > > > > > > ./index -N 100 > > > > > > And now the indexer gives the same damm error: > > > > > > No "Server" command for URL http://www.someserver.com/ - deleted. > > > ( 0 1 1 0 0 0 0 21) Adding URL: http://www.someserver.com/ > > > > > > So all it did was delete all these URLs. I have tried every other > > > combination I can think of after reviewing the ./index -h, but nothing > > seems > > > to work. How in the word do you get these indexed using an external > file? > > > > > > Also before when I hard coded all URLs in aspseek.conf there were about > > 200 > > > URLs which were always shown as "Not Yet Index". How in the heck do you > > get > > > them index or delete the damm things? > > > > > > It doesn't make sense to have to add thousands of URLs in the > aspseek.conf > > > file every time you want to add new URLs to the list. You certainly > don't > > > want to set the system to reindex everything specially if you just added > > > 5,000 URLs the day before. That would use unecessary bandwidth to say > the > > > least. > > > > > > Anyone have any suggestions? > > > > > > end. > > > > > > _________________________________________________________________ > > > Chat with friends online, try MSN Messenger: http://messenger.msn.com > > > > > > > -- -------------------------------------------------------------- Kord Campbell Grub.Org Inc. President 6051 N. Brookline #118 Oklahoma City, OK 73112 [EMAIL PROTECTED] Voice: (405) 843-6336 http://www.grub.org Fax: (405) 848-5477 --------------------------------------------------------------