After many months of promising to do this, I finally did it...  I've 
managed to index my main site, then add a single new document to that 
index.  It took many several hours to figure out how to do it.  Most of 
that was fighting my preconceptions of the settings required.  My notes 
are below.

My website has only about 500 pages, so actually reindexing the whole 
site doesn't take that long.  But if I was to index one of my mailing 
list archives, that normally takes three hours or so to do the whole 
site.  It is for that website which I will be using the merge process 
outlined below.

My questions:

Does this make sense?

Can you see any simplifications to make it easier for others?

Cheers.



THE NOTES BEGIN:


To index the whole site:

./rundig

This creates the index files at 
/usr/local/share/htdig/databases/freebsddiary/

When new documents are added to my site, I modify merge.start.url to 
contain the new URLs.  Then I run:

./rundig.merge

This creates the index files at 
/usr/local/share/htdig/databases/freebsddiary/merge and then merges them 
with the main index files at 
/usr/local/share/htdig/databases/freebsddiary/.

I've taken the original rundig script and cut it down for my needs:

###
[dan@xeon:/home/freebsddiary/configuration/htdig] $ more rundig
#!/bin/sh

#
# rundig
#
# $Id: rundig,v 1.1 2002/01/01 18:49:45 dan Exp $
#
# This is a sample script to create a search database for ht://Dig.
#
DBDIR=/usr/local/share/htdig/databases/freebsddiary/
CONFIG=/home/freebsddiary/configuration/htdig/htdig-freebsddiary.org.conf

COMMONDIR=/usr/local/share/htdig
BINDIR=/usr/local/bin

#
# Set the TMPDIR variable if you want htmerge to put files in a location
# other than the default.  This is important if you do not have enough
# disk space for the big sort that htmerge runs.  Also, be aware that
# on some systems, /tmp is a memory mapped filesystem that takes away
# from virtual memory.
#
TMPDIR=$DBDIR
export TMPDIR

$BINDIR/htdig   -i -vvv -c ${CONFIG}
$BINDIR/htmerge    -vvv -c ${CONFIG}
case "$alt" in
-a)
  ( cd $DBDIR && test -f db.docdb.work &&
    for f in *.work
    do
        mv -f $f `basename $f .work`
    done ) ;;
esac
###

That script takes no parameters.  Everything is hardcoded into the 
script.  It creates a new index from scratch.

Here are some relevent entries from 
/home/freebsddiary/configuration/htdig/htdig-freebsddiary.org.conf:

###
database_dir:           /usr/local/share/htdig/databases/freebsddiary
database_base:          ${database_dir}/db
start_url:              `start.url`
###

The rundig.merge differs slightly and makes use of the -m option:

###
[dan@xeon:/home/freebsddiary/configuration/htdig] $ more rundig.merge
#!/bin/sh

#
# rundig
#
# $Id: rundig,v 1.1 2002/01/01 18:49:45 dan Exp $
#
# This is a sample script to create a search database for ht://Dig.
#
DBDIR=/usr/local/share/htdig/databases/freebsddiary/merge
CONFIG=/home/freebsddiary/configuration/htdig/htdig-freebsddiary.org.conf
CONFIGMERGE=/home/freebsddiary/configuration/htdig/htdig-
freebsddiary.org.merge.conf

COMMONDIR=/usr/local/share/htdig
BINDIR=/usr/local/bin
#
# Set the TMPDIR variable if you want htmerge to put files in a location
# other than the default.  This is important if you do not have enough
# disk space for the big sort that htmerge runs.  Also, be aware that
# on some systems, /tmp is a memory mapped filesystem that takes away
# from virtual memory.
#
TMPDIR=$DBDIR
export TMPDIR

$BINDIR/htdig   -vvv  -c ${CONFIGMERGE}

$BINDIR/htmerge -vvv  -c ${CONFIG} -m ${CONFIGMERGE}
echo "done merge"
case "$alt" in
-a)
  ( cd $DBDIR && test -f db.docdb.work &&
    for f in *.work
    do
                echo "moving $f to " . `basename $f .work`;
        mv -f $f `basename $f .work`
    done ) ;;
esac
###

Extract from the .conf file for the above are:

###
database_dir:           
/usr/local/share/htdig/databases/freebsddiary/merge
database_base:          ${database_dir}/db
start_url:              `merge.start.url`
###

My start urls are specified as this:

$ less merge.start.url
http://diary.unixathome.org/ottawa-pics.php
$ less start.url
http://diary.unixathome.org/


-- 
Dan Langille
The FreeBSD Diary - http://freebsddiary.org/ - practical examples


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to