I have a web archive that constantly has new web pages added to it. 
Each web page that I want indexed has this filename pattern - 
msgxxxxx.html (where xxxxx is a unique number).

My first attempt at having new pages indexed, was to run a crontab 
job calling index.  The aspseek.conf file has an include line of 
server statements
include /home/mhonarc/aspseek_server_start_url

Which contains lines like this

AuthBasic    listone:
Server  http://www.internet-tools.com/listone/

The aspseek.conf file also has these lines

Allow msg.*\.html$ \/$
Disallow .*

This works fine the first time I run the index.

The next time the index program is called, no new URL's are found to 
process, despite new messages being added to the web site.

Does anybody have any suggestions on how to get around this bug, or 
another way to index recently added web pages.

Thanks

mark





Reply via email to