According to [EMAIL PROTECTED]: > Scrive Gilles Detillieux <[EMAIL PROTECTED]>: > > The db.wordlist file is from the 3.1.x series, not 3.2.x. Your > > updatedig > > script is probably not updated correctly for 3.2. Have a look at the > > contrib/examples/rundig.sh script in your 3.2.0b4 source snapshot for > > an > > example of a working update script. (Hmm. I just noticed it's > > missing > > a command to copy the .work_weakcmpr file, though, so you'd need to > > fix that.) > > I already fixed rundig.sh adding a line to copy the .work_weakcmpr to > a .db_weakcmpr file. Actually, this is my "copy section": > > cp $BASEDIR/db/db.docs.index.work $BASEDIR/db/db.docs.index > cp $BASEDIR/db/db.docdb.work $BASEDIR/db/db.docdb > cp $BASEDIR/db/db.excerpts.work $BASEDIR/db/db.excerpts > cp $BASEDIR/db/db.words.db.work $BASEDIR/db/db.words.db > cp $BASEDIR/db/db.words.db.work_weakcmpr $BASEDIR/db/db.words.db_weakcmpr > > the only difference with the script supplied is in the first cp command > (docs.index database).
It's not really necessary to copy the db.docs.index.work file if you always use htdig and htmerge with -a. This file isn't used by htsearch in the 3.2 code. However, it's a good idea to test for the _weakcmpr file before copying it, to avoid an error message if it's not there. It's only created if needed. Here's what I updated the rundig.sh copy section to do: cp $DBDIR/db.docdb.work $DBDIR/db.docdb cp $DBDIR/db.excerpts.work $DBDIR/db.excerpts cp $DBDIR/db.words.db.work $DBDIR/db.words.db test -f $DBDIR/db.words.db.work_weakcmpr && cp $DBDIR/db.words.db.work_weakcmpr $DBDIR/db.words.db_weakcmpr > ...Anyway...you said that rundig.sh build databases from scratch, so I didn't > use rundig.sh to update them . I used updatedig (the one included in the > 3.2.0b4). No, I said rundig builds databases from scratch. I was talking about the standard rundig script that's in installdir, and gets copied to your BIN_DIR by "make install". This is very different from any contributed script, despite the similar name. > Well, the only differences between the script supplied and mine are: > > mv /var/www/htdig/db/db.excerpts /var/www/htdig/db/db.excerpts.old > mv /var/www/htdig/db/db.excerpts.work /var/www/htdig/db/db.excerpts > mv /var/www/htdig/db/db.words.db_weakcmpr /var/www/htdig/db/db.words.db_weakcmpr > .old > mv /var/www/htdig/db/db.words.db.work_weakcmpr /var/www/htdig/db/db.words.db_wea > kcmpr > > miss..... > but there are: > > mv /web/webdocs/htdig/db/db.wordlist /web/webdocs/htdig/db/db.wordlist.old > mv /web/webdocs/htdig/db/db.wordlist.work /web/webdocs/htdig/db/db.wordlist > > mv /web/webdocs/htdig/db/db.words.gdbm /web/webdocs/htdig/db/db.words.gdbm.old > mv /web/webdocs/htdig/db/db.words.gdbm.work /web/webdocs/htdig/db/db.words.gdbm > > that are useless, I suppose. Yes. The updatedig script in contrib/examples is actually written for htdig 3.0.8b2 or older. Contributed works are not always updated along with the rest of the source, so they're frequently outdated, especially with all the database changes that have taken place. We should probably just delete updatedig from the source trees as it's obsolete, and the rundig.sh script does the right thing now. In the 3.0 code, htdig used GDBM. In 3.1, we switched to Berkeley DB, but kept the file names similar (only the gdbm suffix was changed to db). We also introduced excerpt compression and URL part encoding, which also broke some contributed scripts. In 3.2, we use a customized version of the Berkeley DB package, and a very different set of DB files. It's been a challenge keeping everything in the source tree in sync with all these changes, and for contributed code, where we're not even familiar with it's inner workings much of the time, we often just don't bother and leave that as an exercise for the contributors and/or installers. See contrib/README for more information. We should probably get into the habit, before committing any contributed script, to add a comment to it indicating which version it was tested with. > Ok, I begin to understand...Ok, now if I would to launch rundig.sh one time a > month (at 00:00) and I would to run updatedig everyday (at 03:00)...what > changes I need to do? > Since I want updating the databases everyday and I want to rebuild them from > scratch one time a month, I suppose that I have to make some changes to my > scripts....Initially I have to add the following line at the top of my rundig > script: > > rm $DBDIR/* > > Then, in the updatedig, I've to change the "move" commands in "copy" commands : > > mv /var/www/htdig/db/db.docdb /var/www/htdig/db/db.docdb.old > mv /var/www/htdig/db/db.docdb.work /var/www/htdig/db/db.docdb > > in order that the "htdig -a" could find the .work databases every time. > I'm right? Why bother with updatedig at all? It doesn't do what you want it to do, and rundig.sh does do update digs correctly, with the one modification for the weakcmpr file. Why not just run rundig.sh daily and be done with it? Once a month you can remove the $DBDIR/db.*.work* files to force a complete reindexing. > > E.g. if you use contrib/examples/rundig.sh, which leaves copies of the > > .work files around for next time, you could add a hook like this > > before > > calling htdig in that script, to remove the .work files and force a > > full > > reindexing at the start of the month: > > > > case "`LC_TIME=C date`" in > > Sun\ ???\ \ [1-7]\ *) # remove old database on first Sunday of month > > rm -f $DBDIR/db.*.work $DBDIR/db.words.db.work_weakcmpr > > ;; > > esac > > ok, so if I add these few lines to my rundig.sh script, I could avoid the using > of updatedig, couldn't I? Yes, that's the whole point. It just doesn't make sense to maintain and run two different scripts that essentially do the same thing, and run the risk of the two of them running simultaneously. If you want to do the full reindexing on the first day of the month, rather than the first Sunday, you can either change the pattern in the case statement to "???\ ???\ \ 1\ *)", or you can put the remove command as a separate crontab entry a few minutes before the rundig.sh script runs (giving it enough time to finish removing the files if they're large). -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

