aseek-devel  

Re: [aseek-devel] How to index external list of URLs?

Yuriy Soroka
Wed, 18 Sep 2002 09:09:49 -0700

Yes,

I have indexed 255 179 URLs
I was indexing  by 20000 - 40000 URLs

var dir size - 1.5 Gb
I can't say for certain size of mysql database.

Hardware 2 CPU 1.1 GHz each, about 1.5 G of RAM
OS - FreeBSD 4.5 release p6

no special kernel/mysql tuning was done.




----- Original Message -----
From: "Gregory Kozlovsky" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, September 18, 2002 7:05 PM
Subject: RE: [aseek-devel] How to index external list of URLs?


> This is interesting. Can you share with us the size of your database (in
> docs and in GB),
> details of your hardware, and tuning of the Linux kernel and the mysql
> server?
>
>      Gregory Kozlovsky
>
> -----Original Message-----
> From: Yuriy Soroka [mailto:[EMAIL PROTECTED]]
> Sent: Mittwoch, 18. September 2002 02:43
> To: [EMAIL PROTECTED]
> Subject: Re: [aseek-devel] How to index external list of URLs?
>
>
> Why don't you just include them to aspseek.conf
>
> I indexed 250 000 urls.
>
> Include myfile.txt
>
>
> ----- Original Message -----
> From: "J and T" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Wednesday, September 18, 2002 3:10 AM
> Subject: [aseek-devel] How to index external list of URLs?
>
>
> > How in the world do you index a list of URLs NOT in the aspseek.conf? I
> have
> > tried everything I can think of:
> >
> > ./index -i -f myfile.txt
> > ./index -N 100
> >
> > Doesn't work. The myfile.txt lists 5,000 URLs like this:
> >
> > Server http://someserver.com/
> >
> > But when I run the above (ie, ./index -i -f myfile.txt)
> >
> > I get the following error:
> >
> > Bad URL: Server http://someserver.com/
> >
> > So I removed the "Server " so now it reads:
> >
> > http://someserver.com/
> >
> > Did the same thing:
> >
> > ./index -i -f myfile.txt
> >
> > Now it shows them in the database:
> >
> > ./index -S
> >
> > ASPseek database statistics
> >
> >     Status    Expired      Total
> >    -----------------------------
> >          0       5000       5000 Not indexed yet
> >    -----------------------------
> >      Total       5000       5000
> >
> > So now I try to run the indexer:
> >
> > ./index -N 100
> >
> > And now the indexer gives the same damm error:
> >
> > No "Server" command for URL http://www.someserver.com/ - deleted.
> > ( 0  1  1  0  0  0  0 21) Adding URL: http://www.someserver.com/
> >
> > So all it did was delete all these URLs. I have tried every other
> > combination I can think of after reviewing the ./index -h, but nothing
> seems
> > to work. How in the word do you get these indexed using an external
file?
> >
> > Also before when I hard coded all URLs in aspseek.conf there were about
> 200
> > URLs which were always shown as "Not Yet Index". How in the heck do you
> get
> > them index or delete the damm things?
> >
> > It doesn't make sense to have to add thousands of URLs in the
aspseek.conf
> > file every time you want to add new URLs to the list. You certainly
don't
> > want to set the system to reindex everything specially if you just added
> > 5,000 URLs the day before. That would use unecessary bandwidth to say
the
> > least.
> >
> > Anyone have any suggestions?
> >
> > end.
> >
> > _________________________________________________________________
> > Chat with friends online, try MSN Messenger: http://messenger.msn.com
> >
> >