How does one actually clean up the entire database of URLs that are not with status 200? Scenario via ./index:
Database repairing options: -X1 Check inverted index for deleted URLs -X2 Fix database to delete deleted URLs -H Recreate citation indexes and ranks Use "index -X1" to check inverted index for URLs for which "urlword.deleted" field is non-zero. Use "index -X2" to fix it by appending information about deleted keys to the delta files. So if you want to remove records where "urlword.deleted" is non-zero, run index -X2; index -D, and finally perform SQL statements to delete unnecessary records. So my question is what "SQL statements to delete unnecessary records" do I use to completely clean up the database tables and keep aspseek happy at the same time? I would like to "truly" clean up the database of URLs not yet indexed or returned any error status != 200. I know if I just go in mysql and do this: DELETE FROM urlword WHERE status !=200; It's not going to clean up other tables that need cleaning and it will probably break aspseek's ability to function properly. So my question is what needs to be done and in what order? I'll be happy to write a Perl script to do this maintenance and also clean up the stats table if I only knew what all needs to be done. Basically I would like to keep things as compact as possible and keep the database tables optimized. Any help in this matter would be appreciated. Thanks John _________________________________________________________________ Send and receive Hotmail on your mobile device: http://mobile.msn.com
