According to [EMAIL PROTECTED]:
> There's someone who could help me plz?

Well, maybe, yes, but you have to be patient.  We're all busy people,
so you shouldn't expect quick responses, especially over a weekend!

> I installed the 3.2.0b4 and there's few things that I can't understand about 
> htdig databases. BTW I use the standard htdig.conf without any special 
> parameter for the databases.
> 
> 1. What's the use of db.words.db_weakcmpr?

My understanding is it contains some overflow records from the db.words.db
database, when compression is weaker than expected.

> I used the rundig sample of the 3.2.0b4 and there isn't any "move" in it about 
> this database. When I ran the script, everything seemed to work but the engine 
> didn't find anything during the searches. :((
> I had some suspects, so I renamed the db.words.db.work_weakcmpr in 
> db.words.db_weakcmpr and...automagically everything really worked :)))
> Plz someone modify that script :))

The script was fixed many months ago to do this (Jan 10 to be exact).
I suspect you're still running an old copy of the script.  A "make
install" will only copy the new version of rundig if the old version
isn't around, to avoid clobbering a customized script.

> 2. What are the databases that I really need for updating?
> 
> After running rundig I had the following databases:
> 
> total 45M
> -rw-r--r--    1 root          11M Nov 22 11:19 db.words.db
> -rw-r--r--    1 root          11M Nov 20 18:21 db.words.db.work
> -rw-r--r--    1 root          10M Nov 22 11:19 db.excerpts
> -rw-r--r--    1 root          10M Nov 20 18:21 db.excerpts.work
> -rw-r--r--    1 root         712k Nov 22 11:19 db.docdb
> -rw-r--r--    1 root         712k Nov 20 18:21 db.docdb.work
> -rw-r--r--    1 root         320k Nov 22 11:19 db.docs.index
> -rw-r--r--    1 root         320k Nov 20 18:21 db.docs.index.work
> -rw-r--r--    1 root          16k Nov 22 11:16 db.words.db.work_weakcmpr
> -rw-r--r--    1 root          16k Nov 22 11:42 db.words.db_weakcmpr
> 
> I used the updatedig sample of the 3.2.0b4 and after the dig phase (htdig -a -
> t -vv -s -c htdig.conf) i got 2 new databases more:
> 
> -rw-r--r--    1 root          47M Nov 22 12:59 db.worddump (47Mb !!!)
> -rw-r--r--    1 root         7.1M Nov 22 12:59 db.docs
> 
> the first database was an ASCII file, the second one was a DATA file (command 
> file filename :))
> I tried to understand the reasons why I had a file of 47Mb and I saw the -t 
> htdig flag.

Yes, htdig -t, or htdump, will produce those last two files, which
are ascii representations of the whole set of databases, which can
be reloaded into a set of databases on anther machine using htload.
They're huge because they're uncompressed ASCII text.

> But on Htdig site there was something about db.wordlist database 
> (or ASCII file) and nothing about these ones I got :(( BTW in the updatedig 
> script there was a move command about db.wordlist.old in db.wordlist, but I 
> never found these files.

The db.wordlist file is from the 3.1.x series, not 3.2.x.  Your updatedig
script is probably not updated correctly for 3.2.  Have a look at the
contrib/examples/rundig.sh script in your 3.2.0b4 source snapshot for an
example of a working update script.  (Hmm.  I just noticed it's missing
a command to copy the .work_weakcmpr file, though, so you'd need to
fix that.)

> So I ask you more info about them...and ...Do I really need a 47Mb-file?....Do 
> I misunderstand or i need the -a -c flags only to update my databases? 

You can safely get rid of the -t option if you don't want the ASCII
dumps.  You don't need these normally, and now with the htdump command,
you can get them any time you want from the existing databases.  The -v
and -s options are of course also optional, and often only needed for
troubleshooting.

> 3. I modified the updatedig script to have a report of the updating every time.
> 
> The first report has a lot of "not changed" and few "changed" 
> (...and "pushing"). The second and the following reports were totally 
> different. They looked like the rundig report...no more changed/not 
> changed....just...
...
> Someone could explain me why? Why I don't get simply changed(--> pushing)/not 
> changed in my report?

It seems to me that your updatedig script isn't managing the .work files
correctly, so htdig ends up reindexing from scratch.  htdig -a needs to
have all the .work files in place in order to do an update dig, so the
script needs either to leave these copies around, or copy them before
running htdig -a.

> 4. Do I need a purge phase between digging and merging in the updatedig script?
> 
> In my Update Report I got a lot of "Not found: 
> http://www.unina.it/universit/....... Ref: http://www.unina.it/universit/....";.
> Do I need to purge all these references?

Yes, again it seems you're running an outdated script.  You need the
htpurge command after htdig.  You don't need htmerge unless you're merging
two databases together.  That's all htmerge does now since 3.2.0b3.

> 5. I scheduled rundig to be execute 1 time a month and updatedig to be executed
> 
> 1 time a day. I executed manually the first "rundigging"...with the database 
> dir totally empty. When the cron will execute rundig again, it'll reindex from 
> scratch, but do I need to remove in advance the actual database? Or htdig will 
> delete them for me and then it'll rebuild them again? With my scheduling,  
> there will be a day in which rundig and updatedig will run together (one after 
> other). Do I really need to schedule updatedig soon after rundig? ...or I could
> don't run it in that day?

You don't need to run an update soon after a full reindexing, but it may
make your cron scheduling a bit complicated to avoid this unnecessary
update step.  A better way might be to put a hook right in the update
script to force a full update once per month.  There are a number of
different tricks you could use to do this, but it would depend a lot on
how your update script works in the first place.

E.g. if you use contrib/examples/rundig.sh, which leaves copies of the
.work files around for next time, you could add a hook like this before
calling htdig in that script, to remove the .work files and force a full
reindexing at the start of the month:

case "`LC_TIME=C date`" in
Sun\ ???\ \ [1-7]\ *)   # remove old database on first Sunday of month
        rm -f $DBDIR/db.*.work $DBDIR/db.words.db.work_weakcmpr
        ;;
esac


-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to