Guys, let's move such discussions to aseek-users.

To the point: URLs with non-200 status doesn't occupy much
disk space - in fact this is only one record in urlword table.
This data is needed to correctly reindex this URL next time -
probably it will change status from 404 to 200.

So, the situation with non-200 status URLs is NORMAL. There is
absolutely no need to worry.

If you want to index not-indexed-yet URLs (status 0), use
index -s 0

Karen Barnes wrote:
> How does one clean things up. Here's my example of real data:
> 
> ASPseek database statistics
> 
>    Status    Expired      Total
>   -----------------------------
>         0        211        211 Not indexed yet
>       200          0       4738 OK
>       301          0        129 Moved Permanently
>       302          0        311 Moved Temporarily
>       403          0          5 Forbidden
>       404          0       2902 Not found
>   -----------------------------
>     Total        211       8296
> 
> My problem is that these 211 never get indexed, 301, 302, 403 and 404's 
> are always there taking up unecessary disk space and other aspseek and 
> mysql resources. What I want to is remove all that are NOT "Status 200". 
> How can I do this without breaking aspseek. I know you can't just delete 
> them and I'm not going to hand type all the URLs that take up this non 
> 200 status and try a "./index -c "http://url/";. That would take years!
> 
> These stats are on only 5,000 URLs. I plan to index thousands more with 
> similar status results. At this point I don't care if the "Not Yet 
> Indexed" ever get indexed, but I sure would like to know why they never 
> get indexed and how to remove all these non status 200.


-- 
-- [EMAIL PROTECTED]  ICQ7551596  [EMAIL PROTECTED] --
    Guinness a Day Keeps a Doctor Away (people's wisdom)

Reply via email to