Guys, let's move such discussions to aseek-users. To the point: URLs with non-200 status doesn't occupy much disk space - in fact this is only one record in urlword table. This data is needed to correctly reindex this URL next time - probably it will change status from 404 to 200.
So, the situation with non-200 status URLs is NORMAL. There is absolutely no need to worry. If you want to index not-indexed-yet URLs (status 0), use index -s 0 Karen Barnes wrote: > How does one clean things up. Here's my example of real data: > > ASPseek database statistics > > Status Expired Total > ----------------------------- > 0 211 211 Not indexed yet > 200 0 4738 OK > 301 0 129 Moved Permanently > 302 0 311 Moved Temporarily > 403 0 5 Forbidden > 404 0 2902 Not found > ----------------------------- > Total 211 8296 > > My problem is that these 211 never get indexed, 301, 302, 403 and 404's > are always there taking up unecessary disk space and other aspseek and > mysql resources. What I want to is remove all that are NOT "Status 200". > How can I do this without breaking aspseek. I know you can't just delete > them and I'm not going to hand type all the URLs that take up this non > 200 status and try a "./index -c "http://url/". That would take years! > > These stats are on only 5,000 URLs. I plan to index thousands more with > similar status results. At this point I don't care if the "Not Yet > Indexed" ever get indexed, but I sure would like to know why they never > get indexed and how to remove all these non status 200. -- -- [EMAIL PROTECTED] ICQ7551596 [EMAIL PROTECTED] -- Guinness a Day Keeps a Doctor Away (people's wisdom)
