According to [EMAIL PROTECTED]:
> Scrive Gilles Detillieux <[EMAIL PROTECTED]>:
> > The db.wordlist file is from the 3.1.x series, not 3.2.x.  Your
> > updatedig
> > script is probably not updated correctly for 3.2.  Have a look at the
> > contrib/examples/rundig.sh script in your 3.2.0b4 source snapshot for
> > an
> > example of a working update script.  (Hmm.  I just noticed it's
> > missing
> > a command to copy the .work_weakcmpr file, though, so you'd need to
> > fix that.)
> 
> I already fixed rundig.sh adding a line to copy the .work_weakcmpr to 
> a .db_weakcmpr file. Actually, this is my "copy section":
> 
> cp $BASEDIR/db/db.docs.index.work $BASEDIR/db/db.docs.index
> cp $BASEDIR/db/db.docdb.work $BASEDIR/db/db.docdb
> cp $BASEDIR/db/db.excerpts.work $BASEDIR/db/db.excerpts
> cp $BASEDIR/db/db.words.db.work $BASEDIR/db/db.words.db
> cp $BASEDIR/db/db.words.db.work_weakcmpr $BASEDIR/db/db.words.db_weakcmpr
> 
> the only difference with the script supplied is in the first cp command 
> (docs.index database).

It's not really necessary to copy the db.docs.index.work file if you
always use htdig and htmerge with -a.  This file isn't used by htsearch
in the 3.2 code.  However, it's a good idea to test for the _weakcmpr
file before copying it, to avoid an error message if it's not there.
It's only created if needed.  Here's what I updated the rundig.sh
copy section to do:

cp $DBDIR/db.docdb.work $DBDIR/db.docdb
cp $DBDIR/db.excerpts.work $DBDIR/db.excerpts
cp $DBDIR/db.words.db.work $DBDIR/db.words.db
test -f $DBDIR/db.words.db.work_weakcmpr &&
  cp $DBDIR/db.words.db.work_weakcmpr $DBDIR/db.words.db_weakcmpr

> ...Anyway...you said that rundig.sh build databases from scratch, so I didn't 
> use rundig.sh to update them . I used updatedig (the one included in the 
> 3.2.0b4).

No, I said rundig builds databases from scratch.  I was talking about the
standard rundig script that's in installdir, and gets copied to your
BIN_DIR by "make install".  This is very different from any contributed
script, despite the similar name.

> Well, the only differences between the script supplied and mine are:
> 
> mv /var/www/htdig/db/db.excerpts /var/www/htdig/db/db.excerpts.old
> mv /var/www/htdig/db/db.excerpts.work /var/www/htdig/db/db.excerpts
> mv /var/www/htdig/db/db.words.db_weakcmpr /var/www/htdig/db/db.words.db_weakcmpr
> .old
> mv /var/www/htdig/db/db.words.db.work_weakcmpr /var/www/htdig/db/db.words.db_wea
> kcmpr
> 
> miss.....
> but there are:
> 
> mv /web/webdocs/htdig/db/db.wordlist /web/webdocs/htdig/db/db.wordlist.old
> mv /web/webdocs/htdig/db/db.wordlist.work /web/webdocs/htdig/db/db.wordlist
> 
> mv /web/webdocs/htdig/db/db.words.gdbm /web/webdocs/htdig/db/db.words.gdbm.old
> mv /web/webdocs/htdig/db/db.words.gdbm.work /web/webdocs/htdig/db/db.words.gdbm
> 
> that are useless, I suppose.

Yes.  The updatedig script in contrib/examples is actually written for
htdig 3.0.8b2 or older.  Contributed works are not always updated along
with the rest of the source, so they're frequently outdated, especially
with all the database changes that have taken place.  We should probably
just delete updatedig from the source trees as it's obsolete, and the
rundig.sh script does the right thing now.

In the 3.0 code, htdig used GDBM.  In 3.1, we switched to Berkeley DB,
but kept the file names similar (only the gdbm suffix was changed to db).
We also introduced excerpt compression and URL part encoding, which also
broke some contributed scripts.  In 3.2, we use a customized version of
the Berkeley DB package, and a very different set of DB files.  It's
been a challenge keeping everything in the source tree in sync with all
these changes, and for contributed code, where we're not even familiar
with it's inner workings much of the time, we often just don't bother
and leave that as an exercise for the contributors and/or installers.

See contrib/README for more information.

We should probably get into the habit, before committing any contributed
script, to add a comment to it indicating which version it was tested
with.

> Ok, I begin to understand...Ok, now if I would to launch rundig.sh one time a 
> month (at 00:00) and I would to run updatedig everyday (at 03:00)...what 
> changes I need to do?
> Since I want updating the databases everyday and I want to rebuild them from 
> scratch one time a month, I suppose that I have to make some changes to my 
> scripts....Initially I have to add the following line at the top of my rundig 
> script:
> 
> rm $DBDIR/*
> 
> Then, in the updatedig, I've to change the "move" commands in "copy" commands :
> 
> mv /var/www/htdig/db/db.docdb /var/www/htdig/db/db.docdb.old
> mv /var/www/htdig/db/db.docdb.work /var/www/htdig/db/db.docdb
> 
> in order that the "htdig -a" could find the .work databases every time.
> I'm right?

Why bother with updatedig at all?  It doesn't do what you want it to do,
and rundig.sh does do update digs correctly, with the one modification
for the weakcmpr file.  Why not just run rundig.sh daily and be done with
it?  Once a month you can remove the $DBDIR/db.*.work* files to force a
complete reindexing.

> > E.g. if you use contrib/examples/rundig.sh, which leaves copies of the
> > .work files around for next time, you could add a hook like this
> > before
> > calling htdig in that script, to remove the .work files and force a
> > full
> > reindexing at the start of the month:
> > 
> > case "`LC_TIME=C date`" in
> > Sun\ ???\ \ [1-7]\ *)       # remove old database on first Sunday of month
> >     rm -f $DBDIR/db.*.work $DBDIR/db.words.db.work_weakcmpr
> >     ;;
> > esac
> 
> ok, so if I add these few lines to my rundig.sh script, I could avoid the using 
> of updatedig, couldn't I?

Yes, that's the whole point.  It just doesn't make sense to maintain and
run two different scripts that essentially do the same thing, and run the
risk of the two of them running simultaneously.  If you want to do the
full reindexing on the first day of the month, rather than the first
Sunday, you can either change the pattern in the case statement to
"???\ ???\ \ 1\ *)", or you can put the remove command as a separate
crontab entry a few minutes before the rundig.sh script runs (giving
it enough time to finish removing the files if they're large).

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to