aseek-devel  

RE: [aseek-devel] How to index external list of URLs?

Gregory Kozlovsky
Wed, 18 Sep 2002 08:41:08 -0700

This is interesting. Can you share with us the size of your database (in
docs and in GB),
details of your hardware, and tuning of the Linux kernel and the mysql
server?

     Gregory Kozlovsky

-----Original Message-----
From: Yuriy Soroka [mailto:[EMAIL PROTECTED]]
Sent: Mittwoch, 18. September 2002 02:43
To: [EMAIL PROTECTED]
Subject: Re: [aseek-devel] How to index external list of URLs?


Why don't you just include them to aspseek.conf

I indexed 250 000 urls.

Include myfile.txt


----- Original Message -----
From: "J and T" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, September 18, 2002 3:10 AM
Subject: [aseek-devel] How to index external list of URLs?


> How in the world do you index a list of URLs NOT in the aspseek.conf? I
have
> tried everything I can think of:
>
> ./index -i -f myfile.txt
> ./index -N 100
>
> Doesn't work. The myfile.txt lists 5,000 URLs like this:
>
> Server http://someserver.com/
>
> But when I run the above (ie, ./index -i -f myfile.txt)
>
> I get the following error:
>
> Bad URL: Server http://someserver.com/
>
> So I removed the "Server " so now it reads:
>
> http://someserver.com/
>
> Did the same thing:
>
> ./index -i -f myfile.txt
>
> Now it shows them in the database:
>
> ./index -S
>
> ASPseek database statistics
>
>     Status    Expired      Total
>    -----------------------------
>          0       5000       5000 Not indexed yet
>    -----------------------------
>      Total       5000       5000
>
> So now I try to run the indexer:
>
> ./index -N 100
>
> And now the indexer gives the same damm error:
>
> No "Server" command for URL http://www.someserver.com/ - deleted.
> ( 0  1  1  0  0  0  0 21) Adding URL: http://www.someserver.com/
>
> So all it did was delete all these URLs. I have tried every other
> combination I can think of after reviewing the ./index -h, but nothing
seems
> to work. How in the word do you get these indexed using an external file?
>
> Also before when I hard coded all URLs in aspseek.conf there were about
200
> URLs which were always shown as "Not Yet Index". How in the heck do you
get
> them index or delete the damm things?
>
> It doesn't make sense to have to add thousands of URLs in the aspseek.conf
> file every time you want to add new URLs to the list. You certainly don't
> want to set the system to reindex everything specially if you just added
> 5,000 URLs the day before. That would use unecessary bandwidth to say the
> least.
>
> Anyone have any suggestions?
>
> end.
>
> _________________________________________________________________
> Chat with friends online, try MSN Messenger: http://messenger.msn.com
>
>