aseek-devel  

[aseek-devel] how to delete unwanted entries?

Karen Barnes
Wed, 18 Sep 2002 22:21:24 -0700

How does one clean things up. Here's my example of real data:

ASPseek database statistics

    Status    Expired      Total
   -----------------------------
         0        211        211 Not indexed yet
       200          0       4738 OK
       301          0        129 Moved Permanently
       302          0        311 Moved Temporarily
       403          0          5 Forbidden
       404          0       2902 Not found
   -----------------------------
     Total        211       8296

My problem is that these 211 never get indexed, 301, 302, 403 and 404's are 
always there taking up unecessary disk space and other aspseek and mysql 
resources. What I want to is remove all that are NOT "Status 200". How can I 
do this without breaking aspseek. I know you can't just delete them and I'm 
not going to hand type all the URLs that take up this non 200 status and try 
a "./index -c "http://url/";. That would take years!

These stats are on only 5,000 URLs. I plan to index thousands more with 
similar status results. At this point I don't care if the "Not Yet Indexed" 
ever get indexed, but I sure would like to know why they never get indexed 
and how to remove all these non status 200.

Anyone know how?

end

_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com