Re: [htdig] digging 1000 of pages

Geoff Hutchison Thu, 27 Mar 2003 09:09:06 -0800

On 27 Mar 2003, Didelot Loic wrote:

> a) Can i start htdig more then once to get everything indexed faster?


Not on the same set of database files. You can certainly try splitting the
URL space into smaller portions, using htdig multiple times and then using
htmerge to merge the databases together.

There's no guarantee that running multiple processes will index things
faster. Remember that there are limits like network speed, hard drive
transfer rate, RAM, etc. that probably limit your indexing speed more. If
you're indexing on the server itself, you'll probably get much better bang
for your buck out of the local_urls attributes

http://www.htdig.org/attrs.html#local_urls

See Q4.4 and 4.5 in the FAQ for a bit more on multiple databases:
http://www.htdig.org/FAQ.html#q4.4
http://www.htdig.org/FAQ.html#q4.5

> b) What happens if I kill htdig while indexing?  Is then everything
> lost?

If you're running version 3.1.6 or using the -l flag in versions
3.1.0-3.1.5, htdig will spit out a log file and die gracefully.

(The 3.2 beta snapshots do this as well.)

> c) If I restart the htdig process later where will it start indexing?
> Does it start where it has stopped?

Pretty much. It will read in the log file and use this to form the initial
queue of URLs to index.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Re: [htdig] digging 1000 of pages

Reply via email to