On Tue, 2 Jul 2002, Rylan W. Hazelton wrote:

> I let it run for about 8hrs and it only dug about 20% of them.  I need
> to find a way to make the indexing more palatable to the server and was
> hoping someone can help me here.

I'm curious why you're using 3.2. Indexing speed at the moment is
certainly slower than 3.1--it's indexing and storing a significantly large
amount of information. Plus, it's assembling the databases on-the-fly
rather than requiring the separate htmerge step.

> 1) Run a big dig (all 1M posts) then, run nightly digs of the posts in
> the last 24-36 hours, then merge the dbs.

You should also take a look at the -m flag to htdig. This will only index
a set of URLs and do nothing else. (Valid for 3.1.6 and 3.2 betas.)

<http://www.htdig.org/htdig.html>

> 2) break the posts up into ~50-100k page block and index them all
> separately, then merge the dbs.

This depends on how much load your server and CGIs can handle. If you
think the server can handle indexing two sets at once, this will be
faster. If you'd have to do one set, then another, etc. then this will
definitely be slower.

> Also how can I search multiple dbs at once in 3.2?  Are there any docs
> for 3.2?

The installation you have should have full documentation. From a source
.tar.gz, it will be in htdoc/. If you installed from a binary package, it
should include docs as well. Beyond that,
see: <http://www.htdig.org/dev/htdig-3.2/>

To search multiple DB at the same time, you'll need to set up
"collections." You should specify multiple config names to htsearch,
separated by "|" characters. You could also specify one "master" config
with a collection_names attribute.

<http://www.htdig.org/dev/htdig-3.2/attrs.html#collection_names>

> If anyone knows where I can find the correct format for the headers it
> would be much appreciated.

These are standard Last-Modified: headers:
http://www.w3.org/Protocols/HTTP/Object_Headers.html#last-modified

But in order for the stored date to be useful for speeding indexing, the
server/CGI would need to recognize the If-Modified-Since: headers

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to