> Here's a philosophical question - it takes me close to four hours to > reindex things via rundig.sh -- would there be any advantage to creating a > new database each time, rather than appending to the existing database via > the -a switch? Would it be more stable or any faster? Could I create an > alternate config that points to a second database irectory, then copy those > over to the one i use for searching?
You're confused as to what the -a flag does. It simply allows you to work on alternate copies of your databases--so htsearch can access the data while you're indexing. It's most certainly faster to update databases (e.g. as done by the rundig.sh script) than to create them from scratch. On the other hand, if you have some way of knowing the subset of files that have changed (e.g. in mailing list archives), then you will find it faster to use htdig to only index those files and then htmerge them in. (Or use the new -m flag to htdig) <http://www.htdig.org/dev/htdig-3.2/htmerge.html> <http://www.htdig.org/dev/htdig-3.2/htdig.html> (As usual, these documents are always in the documentation with each release in the htdoc/ directory...) On Sat, 3 Nov 2001, Phil Glatz wrote: > At 06:20 PM 11/3/2001 -0600, you wrote: > >At 8:33 AM -0800 11/3/01, Phil Glatz wrote: > >>Nov 3 05:04:31 citynews /kernel: pid 51125 (htdig), uid 0 on /: file > >>system full > >>Nov 3 05:04:31 citynews /kernel: pid 51125 (htdig), uid 0: exited on > >>signal 11 > >> > >>This is happening at the time htmerge is called, which I understand calls > >>sort. > > > >No, I'm a bit confused by this--it says the PID is "htdig," which is > >clearly not "htmerge." > > After a later rundig.sh execution, user root (who was running the process) > got this cron error message: > > Start time: Sat Nov 3 08:53:54 PST 2001 > Segmentation fault > Done Digging: Sat Nov 3 09:01:08 PST 2001 > Done Merging: Sat Nov 3 09:01:08 PST 2001 > End time: Sat Nov 3 09:01:08 PST 2001 > > This correlates with /var/log/messages: > Nov 3 09:01:09 citynews /kernel: pid 81988 (htdig), uid 0 on /: file > system full > Nov 3 09:01:09 citynews /kernel: pid 81988 (htdig), uid 0: exited on signal 11 > > So I'm assuming htdig terminated with the fill filesystem, and htmerge > executed at 9:01:08 > > > > >Two questions: First, I'm assuming the databases aren't on /. > > No - they are on /usr (a separate filesystem) - my htdig base is /usr/local/tmp > > My TMPDIR is set to /usr/tmp > > >Second, do you use external parsers, converters or transport scripts? > > No, I run it through a slightly modified /rundig.sh > > > >Do you see files in /tmp when the message is created? > > I see no temp artifacts in /usr/tmp, /tmp, or / > > ---------------------------------- > > I did another run with -v on, and htdig crapped out with this message: > 1983:21661:1:http://bennington.citynews.com/1031.html: --------FATAL > ERROR:Compressor::get_vals invalid comptype > > the messages log shows: > Nov 3 12:21:14 citynews /kernel: pid 36290 (htdig), uid 0: exited on > signal 11 (core dumped) > > > This is apparently another issue. I tried creating a page with a single > link to the above URL and deleting the database files and running rundig.sh > again, thinking perhaps there was something on the page causing the error, > but it worked fine. > > > Here's a philosophical question - it takes me close to four hours to > reindex things via rundig.sh -- would there be any advantage to creating a > new database each time, rather than appending to the existing database via > the -a switch? Would it be more stable or any faster? Could I create an > alternate config that points to a second database irectory, then copy those > over to the one i use for searching? > > many thanks, Phil > > -- -Geoff Hutchison Williams Students Online http://wso.williams.edu/ _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

