aseek-devel  

Re: [aseek-devel] How to index external list of URLs?

Yuriy Soroka
Tue, 17 Sep 2002 17:18:44 -0700

Why don't you just include them to aspseek.conf

I indexed 250 000 urls.

Include myfile.txt


----- Original Message -----
From: "J and T" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, September 18, 2002 3:10 AM
Subject: [aseek-devel] How to index external list of URLs?


> How in the world do you index a list of URLs NOT in the aspseek.conf? I
have
> tried everything I can think of:
>
> ./index -i -f myfile.txt
> ./index -N 100
>
> Doesn't work. The myfile.txt lists 5,000 URLs like this:
>
> Server http://someserver.com/
>
> But when I run the above (ie, ./index -i -f myfile.txt)
>
> I get the following error:
>
> Bad URL: Server http://someserver.com/
>
> So I removed the "Server " so now it reads:
>
> http://someserver.com/
>
> Did the same thing:
>
> ./index -i -f myfile.txt
>
> Now it shows them in the database:
>
> ./index -S
>
> ASPseek database statistics
>
>     Status    Expired      Total
>    -----------------------------
>          0       5000       5000 Not indexed yet
>    -----------------------------
>      Total       5000       5000
>
> So now I try to run the indexer:
>
> ./index -N 100
>
> And now the indexer gives the same damm error:
>
> No "Server" command for URL http://www.someserver.com/ - deleted.
> ( 0  1  1  0  0  0  0 21) Adding URL: http://www.someserver.com/
>
> So all it did was delete all these URLs. I have tried every other
> combination I can think of after reviewing the ./index -h, but nothing
seems
> to work. How in the word do you get these indexed using an external file?
>
> Also before when I hard coded all URLs in aspseek.conf there were about
200
> URLs which were always shown as "Not Yet Index". How in the heck do you
get
> them index or delete the damm things?
>
> It doesn't make sense to have to add thousands of URLs in the aspseek.conf
> file every time you want to add new URLs to the list. You certainly don't
> want to set the system to reindex everything specially if you just added
> 5,000 URLs the day before. That would use unecessary bandwidth to say the
> least.
>
> Anyone have any suggestions?
>
> end.
>
> _________________________________________________________________
> Chat with friends online, try MSN Messenger: http://messenger.msn.com
>
>