Greetings all, If htdig is interrupted, it creates the file specified in url_log (db.log by default) to contain the URLs seen but not yet visited. If this file exists, its urls are added to the next pass of the digging (if -i isn't used).
My question: Is there a way to ensure that these URLs and their descendents are visited first? If so, that could be pushed as a work-around for the slow digging. Every day, a script can start an incremental dig, and kill it after X hours. If this is guaranteed to continue where it left off, it could mean that the daily digs could still be run during non-peak times. If the URLs are reordered too much, this might result in some pages never getting processed. This would probably require the file to list two classes of URLs: those processed so far, and those seen but not processed. For large data sets, that might slow the exit time down considerably. Thoughts? Lachlan -- [EMAIL PROTECTED] ht://Dig developer DownUnder (http://www.htdig.org) ------------------------------------------------------- This SF.Net email is sponsored by the new InstallShield X. >From Windows to Linux, servers to mobile, InstallShield X is the one installation-authoring solution that does it all. Learn more and evaluate today! http://www.installshield.com/Dev2Dev/0504 _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
