Scrive Gilles Detillieux <[EMAIL PROTECTED]>:

[db.words.db_weakcmpr]

> > I used the rundig sample of the 3.2.0b4 and there isn't any "move" in
> it about 
> > this database. When I ran the script, everything seemed to work but
> the engine 
> > didn't find anything during the searches. :((
> > I had some suspects, so I renamed the db.words.db.work_weakcmpr in 
> > db.words.db_weakcmpr and...automagically everything really worked
> :)))
> > Plz someone modify that script :))
> 
> The script was fixed many months ago to do this (Jan 10 to be exact).
> I suspect you're still running an old copy of the script.  A "make
> install" will only copy the new version of rundig if the old version
> isn't around, to avoid clobbering a customized script.

Sorry Gilles, but I used the rundig.sh script that was in the htdig-3.2.0b4-
110401.tar.gz package :((

 
[db.worddump & db.docs]

> > But on Htdig site there was something about db.wordlist database 
> > (or ASCII file) and nothing about these ones I got :(( BTW in the
> updatedig 
> > script there was a move command about db.wordlist.old in db.wordlist,
> but I 
> > never found these files.
> 
> The db.wordlist file is from the 3.1.x series, not 3.2.x.  Your
> updatedig
> script is probably not updated correctly for 3.2.  Have a look at the
> contrib/examples/rundig.sh script in your 3.2.0b4 source snapshot for
> an
> example of a working update script.  (Hmm.  I just noticed it's
> missing
> a command to copy the .work_weakcmpr file, though, so you'd need to
> fix that.)

I already fixed rundig.sh adding a line to copy the .work_weakcmpr to 
a .db_weakcmpr file. Actually, this is my "copy section":

cp $BASEDIR/db/db.docs.index.work $BASEDIR/db/db.docs.index
cp $BASEDIR/db/db.docdb.work $BASEDIR/db/db.docdb
cp $BASEDIR/db/db.excerpts.work $BASEDIR/db/db.excerpts
cp $BASEDIR/db/db.words.db.work $BASEDIR/db/db.words.db
cp $BASEDIR/db/db.words.db.work_weakcmpr $BASEDIR/db/db.words.db_weakcmpr

the only difference with the script supplied is in the first cp command 
(docs.index database).

...Anyway...you said that rundig.sh build databases from scratch, so I didn't 
use rundig.sh to update them . I used updatedig (the one included in the 
3.2.0b4).
Well, the only differences between the script supplied and mine are:

mv /var/www/htdig/db/db.excerpts /var/www/htdig/db/db.excerpts.old
mv /var/www/htdig/db/db.excerpts.work /var/www/htdig/db/db.excerpts
mv /var/www/htdig/db/db.words.db_weakcmpr /var/www/htdig/db/db.words.db_weakcmpr
.old
mv /var/www/htdig/db/db.words.db.work_weakcmpr /var/www/htdig/db/db.words.db_wea
kcmpr

miss.....
but there are:

mv /web/webdocs/htdig/db/db.wordlist /web/webdocs/htdig/db/db.wordlist.old
mv /web/webdocs/htdig/db/db.wordlist.work /web/webdocs/htdig/db/db.wordlist

mv /web/webdocs/htdig/db/db.words.gdbm /web/webdocs/htdig/db/db.words.gdbm.old
mv /web/webdocs/htdig/db/db.words.gdbm.work /web/webdocs/htdig/db/db.words.gdbm

that are useless, I suppose.

> 
> > 3. I modified the updatedig script to have a report of the updating
> every time.
> > 
> > The first report has a lot of "not changed" and few "changed" 
> > (...and "pushing"). The second and the following reports were totally
> 
> > different. They looked like the rundig report...no more changed/not 
> > changed....just...
> ...
> > Someone could explain me why? Why I don't get simply changed(-->
> pushing)/not 
> > changed in my report?
> 
> It seems to me that your updatedig script isn't managing the .work
> files
> correctly, so htdig ends up reindexing from scratch.  htdig -a needs
> to
> have all the .work files in place in order to do an update dig, so the
> script needs either to leave these copies around, or copy them before
> running htdig -a.

mmmmmm...yes, maybe there's something that doesn't work as it would....
You're right, my rundig.sh builds the .work and then it copies them in the .db.
The updatedig uses the htdig -a command too, but, the first time that it runs 
it finds the .work generated by the htdig -a in the rundig.sh script...but then 
it moves them to the .db databases:

mv /var/www/htdig/db/db.docdb /var/www/htdig/db/db.docdb.old
mv /var/www/htdig/db/db.docdb.work /var/www/htdig/db/db.docdb

in this way, the next time I launch the updatedig script, htdig (with the -a 
option) doesn't find the .work and it rebuilds databases from scratch...

Ok, I begin to understand...Ok, now if I would to launch rundig.sh one time a 
month (at 00:00) and I would to run updatedig everyday (at 03:00)...what 
changes I need to do?
Since I want updating the databases everyday and I want to rebuild them from 
scratch one time a month, I suppose that I have to make some changes to my 
scripts....Initially I have to add the following line at the top of my rundig 
script:

rm $DBDIR/*

Then, in the updatedig, I've to change the "move" commands in "copy" commands :

mv /var/www/htdig/db/db.docdb /var/www/htdig/db/db.docdb.old
mv /var/www/htdig/db/db.docdb.work /var/www/htdig/db/db.docdb

in order that the "htdig -a" could find the .work databases every time.
I'm right?

> 
> > 4. Do I need a purge phase between digging and merging in the
> updatedig script?
> > 
> > In my Update Report I got a lot of "Not found: 
> > http://www.unina.it/universit/....... Ref:
> http://www.unina.it/universit/....";.
> > Do I need to purge all these references?
> 
> Yes, again it seems you're running an outdated script.  You need the
> htpurge command after htdig.  You don't need htmerge unless you're
> merging
> two databases together.  That's all htmerge does now since 3.2.0b3.

:) Infact, I suspected it. Anyway, I used the updatedig script supplied with 
the 3.2.0b4-110401 snapshot...and these are the commands that you can find in 
it:

/web/webdocs/htdig/bin/htdig -a -t $verbose -s
/web/webdocs/htdig/bin/htmerge -a $verbose -s
/web/webdocs/htdig/bin/htnotify $verbose


> E.g. if you use contrib/examples/rundig.sh, which leaves copies of the
> .work files around for next time, you could add a hook like this
> before
> calling htdig in that script, to remove the .work files and force a
> full
> reindexing at the start of the month:
> 
> case "`LC_TIME=C date`" in
> Sun\ ???\ \ [1-7]\ *) # remove old database on first Sunday of month
>       rm -f $DBDIR/db.*.work $DBDIR/db.words.db.work_weakcmpr
>       ;;
> esac

ok, so if I add these few lines to my rundig.sh script, I could avoid the using 
of updatedig, couldn't I?


Thank you very much for your help.


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to