hi,
A little while ago we starting using a new rundig shell script to create
our indexes, with the aim of making them searchable while the indexing
is taking place. This seems to work fine, however i've noticed that the
db.wordlist file seems very small compared to the other files... is this
normal???

Here are the files i've got:
-rw-r--r--   1 root     other    162883584 Jun 23 03:24 db.docdb
-rw-r--r--   1 root     other    162883584 Jun 23 03:24 db.docdb.work
-rw-r--r--   1 root     other    4434944 Jun 23 03:24 db.docs.index
-rw-r--r--   1 root     other         17 Jun 23 03:22 db.wordlist
-rw-r--r--   1 root     other       2048 Jun 23 03:22 db.words.db

Here's the script I run:
#!/bin/sh

# rundig.sh
# a script to drive ht://Dig updates
# Copyright (c) 1998 Colin Viebrock <[EMAIL PROTECTED]>
# Copyright (c) 1998-1999 Geoff Hutchison <[EMAIL PROTECTED]>

if [ "$1" = "-v" ]; then
    verbose="-v"
fi

# This is the directory where htdig lives
BASEDIR=/htdig

# This is the db dir
DBDIR=$BASEDIR/db/newsite

# This is the name of a temporary report file
REPORT=/htdig/htdig.report

# This is who gets the report
REPORT_DEST="[EMAIL PROTECTED]"
export REPORT_DEST

# This is the subject line of the report
SUBJECT="cron: htdig report for domain"

# This is the name of the conf file to use
CONF=config.conf

# This is the directory htdig will use for temporary sort files
TMPDIR=/tmp
export TMPDIR

# This is the PATH used by this script. Change it if you have problems
#  with not finding wc or grep.
PATH=/usr/local/bin:/usr/bin:/bin

##### Dig phase
STARTTIME=`date`
echo Start time: $STARTTIME
echo rundig: Start time:   $STARTTIME > $REPORT
$BASEDIR/bin/htdig $verbose -s -a -c $BASEDIR/conf/$CONF >> $REPORT
TIME=`date`
echo Done Digging: $TIME
echo rundig: Done Digging: $TIME >> $REPORT

##### Merge Phase
$BASEDIR/bin/htmerge $verbose -s -a -c $BASEDIR/conf/$CONF >> $REPORT
TIME=`date`
echo Done Merging: $TIME
echo rundig: Done Merging: $TIME >> $REPORT

##### Cleanup Phase
# To enable htnotify or the soundex search, uncomment the following
lines
# $BASEDIR/bin/htnotify $verbose >>$REPORT
# $BASEDIR/bin/htfuzzy $verbose soundex

# Move 'em into place. Since we only need db.wordlist to do update digs
# and we always use -a, we just leave it as .work
mv $BASEDIR/db/newsite/db.wordlist.work $BASEDIR/db/newsite/db.wordlist
# We need the .work for next time as an update dig, plus the copy for
searching
cp $BASEDIR/db/newsite/db.docdb.work $BASEDIR/db/newsite/db.docdb
# These are generated from htmerge, so we don't want copies of them.
mv $BASEDIR/db/newsite/db.docs.index.work
$BASEDIR/db/newsite/db.docs.index
mv $BASEDIR/db/newsite/db.words.db.work $BASEDIR/db/newsite/db.words.db

END=`date`
echo End time: $END
echo rundig: End time:     $END >> $REPORT
echo 

# Grab the important statistics from the report file
# All lines begin with htdig: or htmerge:
fgrep "htdig:" $REPORT  
echo 
fgrep "htmerge:" $REPORT
echo
fgrep "rundig:" $REPORT
echo

WC=`wc -l $REPORT`
echo Total lines in $REPORT: $WC

# Send out the report ...
mail -s "$SUBJECT - $STARTTIME" $REPORT_DEST < $REPORT

# ... and clean up
rm $REPORT

All help much appreciated,
cheers,
Tom.

-----------------------------------------------------------------------
Tom Freeman                                  Web: http://www.niss.ac.uk
Web Developer                                      Tel: +44 1225 323789
NISS - Division of Eduserv

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to