Title: AW: SERVER command in aspseek.conf

i am relativly new to aspseek and only used it in a test-environment.

first: you have indexed 4mio pages (with 4mio different urls) but you don't have indexed 4mio different servers (that you have set with server command), right?

in this case i think the solution should work. every page/url that get indexed will be tested against the list of filters (allow, disallow, disallownomatch, allownomatch, checkonly etc). only when "ALLOW" is returned the page will be stored. in all other cases the filter routine will return a DISALLOW, in this case the url will be deleted.

the one thing that i am not sure about is: when an url won't be indexed, cause filter says DISALLOW, then the url gets marked with "deleted"-flag in the database, so it should be deleted in (real) in the end.

now the strange thing: when you look into the urlword table you will find a lot of urls that have the deleted flag and are still in the database.

- i am not sure, why this urls are not deleted, also DeleteBad in aspseek.conf is set to 'yes'.
-also i wonder if the pages/words to this url are deleted from the database.
-don't know that happens in the next index-run, will those urls are visited
again, just to find out that they still are not allowed to be index...?

ps: it's better that you also set aspseek to the recipient-list and not only me , so all subscribers can answer you... ;-)


Markus Rietzler
* kommunikation & online service
* RZF NRW
* Tel: 0211.4572-130



-----Urspr�ngliche Nachricht-----
Von: Fabrice VALERE [mailto:[EMAIL PROTECTED]]
Gesendet am: Donnerstag, 28. Juni 2001 16:18
An: [EMAIL PROTECTED]
Betreff: SERVER command in aspseek.conf

hi,

I have about 4 000 000 urls indexed.
If I use the solution, do you think that the bad urls will be deleted

I add the allows commands and I do a ./index -a ?

fabrice



               .~.
               /V\   L   I   N   U   X
              // \\ >Fear the Penguin<
             /(   )\
              ^^-^^

Reply via email to