According to Steve Bankowitz:
> I'm using ht://Dig 3.1.5 on my Solaris 8 box.  I can't seem to get the
> digging part to my liking.  (Or maybe it is the htmerge phase?)
> 
> In the htdig.conf file I have (among other things):
> 
>   start_url:              `/var/tmp/sort.txt`
>   limit_urls_to:          ${start_url}
> 
> In the `/var/tmp/sort.txt` file I have about 10000 lines of something
> like this:
> 
>   http://www.mydomain.com/item.rs?ID=1
>   http://www.mydomain.com/item.rs?ID=2
>   ...
>   http://www.mydomain.com/item.rs?ID=10000
> 
> How do I add a new link to be indexed without indexing the whole 10000
> other files?
> 
> If I create a brand new `/var/tmp/sort.txt` file with:
> 
>   http://www.mydomain.com/item.rs?ID=10001
> 
> The resultant database file will overwrite the other 10000 entries.
> How do I merge them together as one database?  I tried using `htmerge
> -m`, but I don't think that is what I want.  (Or is it?)  I also tried
> using `htdig -a` in my `rundig` script, but all that did was just made a
> backup of the database files first.
> 
> Any suggestions or pointers to RTFMs most appreciated.

There are 3 ways to add new documents to an existing database:

1) update digs, running htdig without -i

2) minimal update digs, running htdig without -i, but with -m and
   giving it a file containing a list of URLs to dig

3) digging a separate database, and merging it into the main one
   with htmerge -m

Option 1 works pretty well for static pages.  htdig will quickly
check all documents to see which have been updated, and only reparse
the ones that have.  It doesn't work well with dynamic content, though,
because htdig will always think all URLs have been updated.  It looks
like you'll have that problem.

Option 2 may do what you want, but you'll need to run the latest
snapshot of htdig 3.1.6 or 3.2.0b4 for this, and you'll need to
set up a new file with all URLs to be added to the database (it
looks like you're doing this already).

Option 3 works with older versions of the code, but it takes a
bit more work to set up.  I'm not sure why you think this isn't
what you want, but you may have come to that conclusion because
you weren't doing it right.  Read the htmerge documentation
carefully, and be sure you set up the two config files correctly
and give them to htmerge in the right order.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to