Kir Kolyshkin
Thu, 19 Sep 2002 01:50:08 -0700
> no special kernel/mysql tuning was done. Well, you haven't even tried to increase MySQL's key_buffer size (which is even described in ASPseek's FAQ), but already looking for rewriting the code. Seems to be a weird approach to me.
Note that ASPseek does not store everything in SQL DB. Data that are crucial to search speed is stored in own binary files. Yuriy Soroka wrote: > It depends on number of search words in query. > Normally 2-3 words query is returned within a fractions of second. > Complicated query - about 1 second. Maybe little more. > > Anyway i am not satisfied with performance too, and i am interested in > replacing > RDBMS with fast native filesystem storage. > > where is the bottleneck? mysql database or indices? > As for me it seems to be DBMS. Mysql is getting too slow when you have > couple of millions records in table. > > I was thinking of adding Berkley DB library instead of mysql. For now it is > just thoughts. > If anyone can share his experience in this area, please do it. > I will be glad to hear suggestions from you. > > Yuriy > > ----- Original Message ----- > From: "Kord Campbell" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Wednesday, September 18, 2002 9:10 PM > Subject: Re: [aseek-devel] How to index external list of URLs? > > > >>How fast does it return a search result? We had managed to index >>about a million sites about a year and a half ago, and the search >>times were horrible. >> >>Oh, BTW, we do a fair bit of crawling the Internet ourselves. I've >>always envisioned that aspseek could have a plugin to take data >>from us, but we figured that it couldn't handle the millions of >>URLs that we were crawling everyday. >> >>Kord >> >>On Wed, 18 Sep 2002, Yuriy Soroka wrote: >> >> >>>Yes, >>> >>>I have indexed 255 179 URLs >>>I was indexing by 20000 - 40000 URLs >>> >>>var dir size - 1.5 Gb >>>I can't say for certain size of mysql database. >>> >>>Hardware 2 CPU 1.1 GHz each, about 1.5 G of RAM >>>OS - FreeBSD 4.5 release p6 >>> >>>no special kernel/mysql tuning was done. >>> >>> >>> >>> >>>----- Original Message ----- >>>From: "Gregory Kozlovsky" <[EMAIL PROTECTED]> >>>To: <[EMAIL PROTECTED]> >>>Sent: Wednesday, September 18, 2002 7:05 PM >>>Subject: RE: [aseek-devel] How to index external list of URLs? >>> >>> >>> >>>>This is interesting. Can you share with us the size of your database >>> > (in > >>>>docs and in GB), >>>>details of your hardware, and tuning of the Linux kernel and the mysql >>>>server? >>>> >>>> Gregory Kozlovsky >>>> >>>>-----Original Message----- >>>>From: Yuriy Soroka [mailto:[EMAIL PROTECTED]] >>>>Sent: Mittwoch, 18. September 2002 02:43 >>>>To: [EMAIL PROTECTED] >>>>Subject: Re: [aseek-devel] How to index external list of URLs? >>>> >>>> >>>>Why don't you just include them to aspseek.conf >>>> >>>>I indexed 250 000 urls. >>>> >>>>Include myfile.txt >>>> >>>> >>>>----- Original Message ----- >>>>From: "J and T" <[EMAIL PROTECTED]> >>>>To: <[EMAIL PROTECTED]> >>>>Sent: Wednesday, September 18, 2002 3:10 AM >>>>Subject: [aseek-devel] How to index external list of URLs? >>>> >>>> >>>> >>>>>How in the world do you index a list of URLs NOT in the >>>> > aspseek.conf? I > >>>>have >>>> >>>>>tried everything I can think of: >>>>> >>>>>./index -i -f myfile.txt >>>>>./index -N 100 >>>>> >>>>>Doesn't work. The myfile.txt lists 5,000 URLs like this: >>>>> >>>>>Server http://someserver.com/ >>>>> >>>>>But when I run the above (ie, ./index -i -f myfile.txt) >>>>> >>>>>I get the following error: >>>>> >>>>>Bad URL: Server http://someserver.com/ >>>>> >>>>>So I removed the "Server " so now it reads: >>>>> >>>>>http://someserver.com/ >>>>> >>>>>Did the same thing: >>>>> >>>>>./index -i -f myfile.txt >>>>> >>>>>Now it shows them in the database: >>>>> >>>>>./index -S >>>>> >>>>>ASPseek database statistics >>>>> >>>>> Status Expired Total >>>>> ----------------------------- >>>>> 0 5000 5000 Not indexed yet >>>>> ----------------------------- >>>>> Total 5000 5000 >>>>> >>>>>So now I try to run the indexer: >>>>> >>>>>./index -N 100 >>>>> >>>>>And now the indexer gives the same damm error: >>>>> >>>>>No "Server" command for URL http://www.someserver.com/ - deleted. >>>>>( 0 1 1 0 0 0 0 21) Adding URL: http://www.someserver.com/ >>>>> >>>>>So all it did was delete all these URLs. I have tried every other >>>>>combination I can think of after reviewing the ./index -h, but >>>> > nothing > >>>>seems >>>> >>>>>to work. How in the word do you get these indexed using an external >>>> >>>file? >>> >>>>>Also before when I hard coded all URLs in aspseek.conf there were >>>> > about > >>>>200 >>>> >>>>>URLs which were always shown as "Not Yet Index". How in the heck do >>>> > you > >>>>get >>>> >>>>>them index or delete the damm things? >>>>> >>>>>It doesn't make sense to have to add thousands of URLs in the >>>> >>>aspseek.conf >>> >>>>>file every time you want to add new URLs to the list. You certainly >>>> >>>don't >>> >>>>>want to set the system to reindex everything specially if you just >>>> > added > >>>>>5,000 URLs the day before. That would use unecessary bandwidth to >>>> > say > >>>the >>> >>>>>least. >>>>> >>>>>Anyone have any suggestions? >>>>> >>>>>end. >>>>> >>>>>_________________________________________________________________ >>>>>Chat with friends online, try MSN Messenger: >>>> > http://messenger.msn.com > >>>>> >>-- >>-------------------------------------------------------------- >>Kord Campbell Grub.Org Inc. >>President 6051 N. Brookline #118 >> Oklahoma City, OK 73112 >>[EMAIL PROTECTED] Voice: (405) 843-6336 >>http://www.grub.org Fax: (405) 848-5477 >>-------------------------------------------------------------- >> >> > > > > -- -- [EMAIL PROTECTED] ICQ7551596 [EMAIL PROTECTED] -- Guinness a Day Keeps a Doctor Away (people's wisdom)