According to Dan Langille:
> I have a website with a large number of pages. Rather than reindex the
> whole site as new pages are added, I'm looking for a solution which
> involves indexing only the new page. The FAQ holds a glimmer of hope.
>
> http://www.htdig.org/FAQ.html#q4.5 shows that htmerge can merge
> multiple databases. My proposal is to use htdig to index the new page,
> then use htmerge to merge the new index into the old index. Does that
> sound like a plan?
>
> The next step: if a page changes, reindex it, the merge. That sounds
> like a good idea, but I fear it may not be possible. I'm suspecting, but
> haven't determined yet, that merging is for disjoint databases which do
> not overlap. Is that correct?
This should work fine. htmerge is designed to handle the case where a
URL appears in both databases, and it uses the most recently updated
record. There are some problems in htmerge that can affect merging, though,
so I recommend you install this patch:
ftp://ftp.ccsf.org/htdig-patches/3.1.5/words-db.cc-rundig.0
You may also be interested in this patch
ftp://ftp.ccsf.org/htdig-patches/3.1.5/htdump-htload.0
which adds a -m (minimal) option, whereby you can provide a file of URLs
to be updated, without having to attempt updates on every other URL in
the database.
However, even without the -m option, an htdig without -i will run
significantly faster than with -i, because it only reindexes documents
that are changed, and fairly quickly skips over the ones that haven't
changed since the last run. This is quicker still if you use local_urls
rather than going through HTTP.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html