hi,
A little while ago we starting using a new rundig shell script to create
our indexes, with the aim of making them searchable while the indexing
is taking place. This seems to work fine, however i've noticed that the
db.wordlist file seems very small compared to the other files... is this
normal???
Here are the files i've got:
-rw-r--r-- 1 root other 162883584 Jun 23 03:24 db.docdb
-rw-r--r-- 1 root other 162883584 Jun 23 03:24 db.docdb.work
-rw-r--r-- 1 root other 4434944 Jun 23 03:24 db.docs.index
-rw-r--r-- 1 root other 17 Jun 23 03:22 db.wordlist
-rw-r--r-- 1 root other 2048 Jun 23 03:22 db.words.db
Here's the script I run:
#!/bin/sh
# rundig.sh
# a script to drive ht://Dig updates
# Copyright (c) 1998 Colin Viebrock <[EMAIL PROTECTED]>
# Copyright (c) 1998-1999 Geoff Hutchison <[EMAIL PROTECTED]>
if [ "$1" = "-v" ]; then
verbose="-v"
fi
# This is the directory where htdig lives
BASEDIR=/htdig
# This is the db dir
DBDIR=$BASEDIR/db/newsite
# This is the name of a temporary report file
REPORT=/htdig/htdig.report
# This is who gets the report
REPORT_DEST="[EMAIL PROTECTED]"
export REPORT_DEST
# This is the subject line of the report
SUBJECT="cron: htdig report for domain"
# This is the name of the conf file to use
CONF=config.conf
# This is the directory htdig will use for temporary sort files
TMPDIR=/tmp
export TMPDIR
# This is the PATH used by this script. Change it if you have problems
# with not finding wc or grep.
PATH=/usr/local/bin:/usr/bin:/bin
##### Dig phase
STARTTIME=`date`
echo Start time: $STARTTIME
echo rundig: Start time: $STARTTIME > $REPORT
$BASEDIR/bin/htdig $verbose -s -a -c $BASEDIR/conf/$CONF >> $REPORT
TIME=`date`
echo Done Digging: $TIME
echo rundig: Done Digging: $TIME >> $REPORT
##### Merge Phase
$BASEDIR/bin/htmerge $verbose -s -a -c $BASEDIR/conf/$CONF >> $REPORT
TIME=`date`
echo Done Merging: $TIME
echo rundig: Done Merging: $TIME >> $REPORT
##### Cleanup Phase
# To enable htnotify or the soundex search, uncomment the following
lines
# $BASEDIR/bin/htnotify $verbose >>$REPORT
# $BASEDIR/bin/htfuzzy $verbose soundex
# Move 'em into place. Since we only need db.wordlist to do update digs
# and we always use -a, we just leave it as .work
mv $BASEDIR/db/newsite/db.wordlist.work $BASEDIR/db/newsite/db.wordlist
# We need the .work for next time as an update dig, plus the copy for
searching
cp $BASEDIR/db/newsite/db.docdb.work $BASEDIR/db/newsite/db.docdb
# These are generated from htmerge, so we don't want copies of them.
mv $BASEDIR/db/newsite/db.docs.index.work
$BASEDIR/db/newsite/db.docs.index
mv $BASEDIR/db/newsite/db.words.db.work $BASEDIR/db/newsite/db.words.db
END=`date`
echo End time: $END
echo rundig: End time: $END >> $REPORT
echo
# Grab the important statistics from the report file
# All lines begin with htdig: or htmerge:
fgrep "htdig:" $REPORT
echo
fgrep "htmerge:" $REPORT
echo
fgrep "rundig:" $REPORT
echo
WC=`wc -l $REPORT`
echo Total lines in $REPORT: $WC
# Send out the report ...
mail -s "$SUBJECT - $STARTTIME" $REPORT_DEST < $REPORT
# ... and clean up
rm $REPORT
All help much appreciated,
cheers,
Tom.
-----------------------------------------------------------------------
Tom Freeman Web: http://www.niss.ac.uk
Web Developer Tel: +44 1225 323789
NISS - Division of Eduserv
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html