Kir Kolyshkin
Thu, 19 Sep 2002 01:46:49 -0700
ASPseek theoretical limit is about 50 million pages (URLs). How many URLs you have?
Kord Campbell wrote: > How fast does it return a search result? We had managed to index > about a million sites about a year and a half ago, and the search > times were horrible. > > Oh, BTW, we do a fair bit of crawling the Internet ourselves. I've > always envisioned that aspseek could have a plugin to take data > from us, but we figured that it couldn't handle the millions of > URLs that we were crawling everyday. > > Kord > > On Wed, 18 Sep 2002, Yuriy Soroka wrote: > > >>Yes, >> >>I have indexed 255 179 URLs >>I was indexing by 20000 - 40000 URLs >> >>var dir size - 1.5 Gb >>I can't say for certain size of mysql database. >> >>Hardware 2 CPU 1.1 GHz each, about 1.5 G of RAM >>OS - FreeBSD 4.5 release p6 >> >>no special kernel/mysql tuning was done. >> >> >> >> >>----- Original Message ----- >>From: "Gregory Kozlovsky" <[EMAIL PROTECTED]> >>To: <[EMAIL PROTECTED]> >>Sent: Wednesday, September 18, 2002 7:05 PM >>Subject: RE: [aseek-devel] How to index external list of URLs? >> >> >> >>>This is interesting. Can you share with us the size of your database (in >>>docs and in GB), >>>details of your hardware, and tuning of the Linux kernel and the mysql >>>server? >>> >>> Gregory Kozlovsky >>> >>>-----Original Message----- >>>From: Yuriy Soroka [mailto:[EMAIL PROTECTED]] >>>Sent: Mittwoch, 18. September 2002 02:43 >>>To: [EMAIL PROTECTED] >>>Subject: Re: [aseek-devel] How to index external list of URLs? >>> >>> >>>Why don't you just include them to aspseek.conf >>> >>>I indexed 250 000 urls. >>> >>>Include myfile.txt >>> >>> >>>----- Original Message ----- >>>From: "J and T" <[EMAIL PROTECTED]> >>>To: <[EMAIL PROTECTED]> >>>Sent: Wednesday, September 18, 2002 3:10 AM >>>Subject: [aseek-devel] How to index external list of URLs? >>> >>> >>> >>>>How in the world do you index a list of URLs NOT in the aspseek.conf? I >>> >>>have >>> >>>>tried everything I can think of: >>>> >>>>./index -i -f myfile.txt >>>>./index -N 100 >>>> >>>>Doesn't work. The myfile.txt lists 5,000 URLs like this: >>>> >>>>Server http://someserver.com/ >>>> >>>>But when I run the above (ie, ./index -i -f myfile.txt) >>>> >>>>I get the following error: >>>> >>>>Bad URL: Server http://someserver.com/ >>>> >>>>So I removed the "Server " so now it reads: >>>> >>>>http://someserver.com/ >>>> >>>>Did the same thing: >>>> >>>>./index -i -f myfile.txt >>>> >>>>Now it shows them in the database: >>>> >>>>./index -S >>>> >>>>ASPseek database statistics >>>> >>>> Status Expired Total >>>> ----------------------------- >>>> 0 5000 5000 Not indexed yet >>>> ----------------------------- >>>> Total 5000 5000 >>>> >>>>So now I try to run the indexer: >>>> >>>>./index -N 100 >>>> >>>>And now the indexer gives the same damm error: >>>> >>>>No "Server" command for URL http://www.someserver.com/ - deleted. >>>>( 0 1 1 0 0 0 0 21) Adding URL: http://www.someserver.com/ >>>> >>>>So all it did was delete all these URLs. I have tried every other >>>>combination I can think of after reviewing the ./index -h, but nothing >>> >>>seems >>> >>>>to work. How in the word do you get these indexed using an external >>> >>file? >> >>>>Also before when I hard coded all URLs in aspseek.conf there were about >>> >>>200 >>> >>>>URLs which were always shown as "Not Yet Index". How in the heck do you >>> >>>get >>> >>>>them index or delete the damm things? >>>> >>>>It doesn't make sense to have to add thousands of URLs in the >>> >>aspseek.conf >> >>>>file every time you want to add new URLs to the list. You certainly >>> >>don't >> >>>>want to set the system to reindex everything specially if you just added >>>>5,000 URLs the day before. That would use unecessary bandwidth to say >>> >>the >> >>>>least. >>>> >>>>Anyone have any suggestions? >>>> >>>>end. >>>> >>>>_________________________________________________________________ >>>>Chat with friends online, try MSN Messenger: http://messenger.msn.com >>>> >>>> >>> > -- -- [EMAIL PROTECTED] ICQ7551596 [EMAIL PROTECTED] -- Guinness a Day Keeps a Doctor Away (people's wisdom)