On 1 Oct 2001 at 14:12, Dan Langille wrote:

> etc.  I suspect the lack of a trailing / is causing the redirect
> as mentioned above and then htdig never retries the 
> redirected URL.  Does that sound plausible?

I think this procedure may be helpful to others who will 
want to do similar things.

My starturl pointed to http://www.unixathome.org/adsl/.
Originally, the links for 2001_05..2001_09 did not contain
trailing /'s.  This is why I suspect htdig didn't include them
in the results.

I modified my htdig configuration file to contain this:

start_url: http://www.unixathome.org/adsl/archives/2001_05/
             http://www.unixathome.org/adsl/archives/2001_06/

[that's all one line but wrapped for mailing]

Then I re-ran the dig and merge:

$ htdig -vvv -a -i -s  -c /home/dan/htdig-unixathome.org-adsl.conf > htdig.out

Q: I suspect that since I was merely adding previously omitted
URLs, I should not have used the -i option.  This was an update
dig.  Correct?

Then I re-ran the merge:

htmerge -vvv -a -s -c /home/dan/htdig-unixathome.org-adsl.
conf > htmerge.out

Examining the output of both steps shows that 2001_05 and
2001_06 have both been included in the results.

Given these files, is there a way to merge them?  Or
did I mess that opportunity up with my last htdig?

# ls /usr/local/share/htdig/un*
/usr/local/share/htdig/unixathome-adsl.docdb
/usr/local/share/htdig/unixathome-adsl.docdb.work
/usr/local/share/htdig/unixathome-adsl.docs.index
/usr/local/share/htdig/unixathome-adsl.docs.index.work
/usr/local/share/htdig/unixathome-adsl.wordlist
/usr/local/share/htdig/unixathome-adsl.wordlist.work
/usr/local/share/htdig/unixathome-adsl.words.db
/usr/local/share/htdig/unixathome-adsl.words.db.work

Thanks.
-- 
Dan Langille
The FreeBSD Diary - http://freebsddiary.org/ - practical examples


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to