Karen Barnes
Wed, 18 Sep 2002 22:21:24 -0700
How does one clean things up. Here's my example of real data:
ASPseek database statistics
Status Expired Total
-----------------------------
0 211 211 Not indexed yet
200 0 4738 OK
301 0 129 Moved Permanently
302 0 311 Moved Temporarily
403 0 5 Forbidden
404 0 2902 Not found
-----------------------------
Total 211 8296
My problem is that these 211 never get indexed, 301, 302, 403 and 404's are
always there taking up unecessary disk space and other aspseek and mysql
resources. What I want to is remove all that are NOT "Status 200". How can I
do this without breaking aspseek. I know you can't just delete them and I'm
not going to hand type all the URLs that take up this non 200 status and try
a "./index -c "http://url/". That would take years!
These stats are on only 5,000 URLs. I plan to index thousands more with
similar status results. At this point I don't care if the "Not Yet Indexed"
ever get indexed, but I sure would like to know why they never get indexed
and how to remove all these non status 200.
Anyone know how?
end
_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com