According to Dan Langille:
> I have a website with a large number of pages.  Rather than reindex the 
> whole site as new pages are added, I'm looking for a solution which 
> involves indexing only the new page.  The FAQ holds a glimmer of hope. 
> 
> http://www.htdig.org/FAQ.html#q4.5 shows that htmerge can merge 
> multiple databases.  My proposal is to use htdig to index the new page, 
> then use htmerge to merge the new index into the old index.  Does that 
> sound like a plan?
> 
> The next step: if a page changes, reindex it, the merge.  That sounds 
> like a good idea, but I fear it may not be possible.  I'm suspecting, but 
> haven't determined yet, that merging is for disjoint databases which do 
> not overlap.  Is that correct?

This should work fine.  htmerge is designed to handle the case where a
URL appears in both databases, and it uses the most recently updated
record.  There are some problems in htmerge that can affect merging, though,
so I recommend you install this patch:

   ftp://ftp.ccsf.org/htdig-patches/3.1.5/words-db.cc-rundig.0

You may also be interested in this patch

   ftp://ftp.ccsf.org/htdig-patches/3.1.5/htdump-htload.0

which adds a -m (minimal) option, whereby you can provide a file of URLs
to be updated, without having to attempt updates on every other URL in
the database.

However, even without the -m option, an htdig without -i will run
significantly faster than with -i, because it only reindexes documents
that are changed, and fairly quickly skips over the ones that haven't
changed since the last run.  This is quicker still if you use local_urls
rather than going through HTTP.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to