[htdig-dev] url_log behaviour

Lachlan Andrew Mon, 14 Jun 2004 03:01:43 -0700

Greetings all,

If  htdig  is interrupted, it creates the file specified in  url_log  
(db.log by default) to contain the URLs seen but not yet visited.  If 
this file exists, its urls are added to the next pass of the digging 
(if -i isn't used).


My question:  Is there a way to ensure that these URLs and their 
descendents are visited first?  If so, that could be pushed as a 
work-around for the slow digging.  Every day, a script can start an 
incremental dig, and kill it after X hours.  If this is guaranteed to 
continue where it left off, it could mean that the daily digs could 
still be run during non-peak times.  If the URLs are reordered too 
much, this might result in some pages never getting processed.  This 
would probably require the file to list two classes of URLs:  those 
processed so far, and those seen but not processed.  For large data 
sets, that might slow the exit time down considerably.

Thoughts?

Lachlan

-- 
[EMAIL PROTECTED]
ht://Dig developer DownUnder  (http://www.htdig.org)



-------------------------------------------------------
This SF.Net email is sponsored by the new InstallShield X.
>From Windows to Linux, servers to mobile, InstallShield X is the
one installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

[htdig-dev] url_log behaviour

Reply via email to