Yuriy Soroka
Wed, 18 Sep 2002 11:10:18 -0700
It depends on number of search words in query. Normally 2-3 words query is returned within a fractions of second. Complicated query - about 1 second. Maybe little more. Anyway i am not satisfied with performance too, and i am interested in replacing RDBMS with fast native filesystem storage. where is the bottleneck? mysql database or indices? As for me it seems to be DBMS. Mysql is getting too slow when you have couple of millions records in table. I was thinking of adding Berkley DB library instead of mysql. For now it is just thoughts. If anyone can share his experience in this area, please do it. I will be glad to hear suggestions from you. Yuriy ----- Original Message ----- From: "Kord Campbell" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, September 18, 2002 9:10 PM Subject: Re: [aseek-devel] How to index external list of URLs? > How fast does it return a search result? We had managed to index > about a million sites about a year and a half ago, and the search > times were horrible. > > Oh, BTW, we do a fair bit of crawling the Internet ourselves. I've > always envisioned that aspseek could have a plugin to take data > from us, but we figured that it couldn't handle the millions of > URLs that we were crawling everyday. > > Kord > > On Wed, 18 Sep 2002, Yuriy Soroka wrote: > > > Yes, > > > > I have indexed 255 179 URLs > > I was indexing by 20000 - 40000 URLs > > > > var dir size - 1.5 Gb > > I can't say for certain size of mysql database. > > > > Hardware 2 CPU 1.1 GHz each, about 1.5 G of RAM > > OS - FreeBSD 4.5 release p6 > > > > no special kernel/mysql tuning was done. > > > > > > > > > > ----- Original Message ----- > > From: "Gregory Kozlovsky" <[EMAIL PROTECTED]> > > To: <[EMAIL PROTECTED]> > > Sent: Wednesday, September 18, 2002 7:05 PM > > Subject: RE: [aseek-devel] How to index external list of URLs? > > > > > > > This is interesting. Can you share with us the size of your database (in > > > docs and in GB), > > > details of your hardware, and tuning of the Linux kernel and the mysql > > > server? > > > > > > Gregory Kozlovsky > > > > > > -----Original Message----- > > > From: Yuriy Soroka [mailto:[EMAIL PROTECTED]] > > > Sent: Mittwoch, 18. September 2002 02:43 > > > To: [EMAIL PROTECTED] > > > Subject: Re: [aseek-devel] How to index external list of URLs? > > > > > > > > > Why don't you just include them to aspseek.conf > > > > > > I indexed 250 000 urls. > > > > > > Include myfile.txt > > > > > > > > > ----- Original Message ----- > > > From: "J and T" <[EMAIL PROTECTED]> > > > To: <[EMAIL PROTECTED]> > > > Sent: Wednesday, September 18, 2002 3:10 AM > > > Subject: [aseek-devel] How to index external list of URLs? > > > > > > > > > > How in the world do you index a list of URLs NOT in the aspseek.conf? I > > > have > > > > tried everything I can think of: > > > > > > > > ./index -i -f myfile.txt > > > > ./index -N 100 > > > > > > > > Doesn't work. The myfile.txt lists 5,000 URLs like this: > > > > > > > > Server http://someserver.com/ > > > > > > > > But when I run the above (ie, ./index -i -f myfile.txt) > > > > > > > > I get the following error: > > > > > > > > Bad URL: Server http://someserver.com/ > > > > > > > > So I removed the "Server " so now it reads: > > > > > > > > http://someserver.com/ > > > > > > > > Did the same thing: > > > > > > > > ./index -i -f myfile.txt > > > > > > > > Now it shows them in the database: > > > > > > > > ./index -S > > > > > > > > ASPseek database statistics > > > > > > > > Status Expired Total > > > > ----------------------------- > > > > 0 5000 5000 Not indexed yet > > > > ----------------------------- > > > > Total 5000 5000 > > > > > > > > So now I try to run the indexer: > > > > > > > > ./index -N 100 > > > > > > > > And now the indexer gives the same damm error: > > > > > > > > No "Server" command for URL http://www.someserver.com/ - deleted. > > > > ( 0 1 1 0 0 0 0 21) Adding URL: http://www.someserver.com/ > > > > > > > > So all it did was delete all these URLs. I have tried every other > > > > combination I can think of after reviewing the ./index -h, but nothing > > > seems > > > > to work. How in the word do you get these indexed using an external > > file? > > > > > > > > Also before when I hard coded all URLs in aspseek.conf there were about > > > 200 > > > > URLs which were always shown as "Not Yet Index". How in the heck do you > > > get > > > > them index or delete the damm things? > > > > > > > > It doesn't make sense to have to add thousands of URLs in the > > aspseek.conf > > > > file every time you want to add new URLs to the list. You certainly > > don't > > > > want to set the system to reindex everything specially if you just added > > > > 5,000 URLs the day before. That would use unecessary bandwidth to say > > the > > > > least. > > > > > > > > Anyone have any suggestions? > > > > > > > > end. > > > > > > > > _________________________________________________________________ > > > > Chat with friends online, try MSN Messenger: http://messenger.msn.com > > > > > > > > > > > > -- > -------------------------------------------------------------- > Kord Campbell Grub.Org Inc. > President 6051 N. Brookline #118 > Oklahoma City, OK 73112 > [EMAIL PROTECTED] Voice: (405) 843-6336 > http://www.grub.org Fax: (405) 848-5477 > -------------------------------------------------------------- > >