According to [EMAIL PROTECTED]: > There's someone who could help me plz?
Well, maybe, yes, but you have to be patient. We're all busy people, so you shouldn't expect quick responses, especially over a weekend! > I installed the 3.2.0b4 and there's few things that I can't understand about > htdig databases. BTW I use the standard htdig.conf without any special > parameter for the databases. > > 1. What's the use of db.words.db_weakcmpr? My understanding is it contains some overflow records from the db.words.db database, when compression is weaker than expected. > I used the rundig sample of the 3.2.0b4 and there isn't any "move" in it about > this database. When I ran the script, everything seemed to work but the engine > didn't find anything during the searches. :(( > I had some suspects, so I renamed the db.words.db.work_weakcmpr in > db.words.db_weakcmpr and...automagically everything really worked :))) > Plz someone modify that script :)) The script was fixed many months ago to do this (Jan 10 to be exact). I suspect you're still running an old copy of the script. A "make install" will only copy the new version of rundig if the old version isn't around, to avoid clobbering a customized script. > 2. What are the databases that I really need for updating? > > After running rundig I had the following databases: > > total 45M > -rw-r--r-- 1 root 11M Nov 22 11:19 db.words.db > -rw-r--r-- 1 root 11M Nov 20 18:21 db.words.db.work > -rw-r--r-- 1 root 10M Nov 22 11:19 db.excerpts > -rw-r--r-- 1 root 10M Nov 20 18:21 db.excerpts.work > -rw-r--r-- 1 root 712k Nov 22 11:19 db.docdb > -rw-r--r-- 1 root 712k Nov 20 18:21 db.docdb.work > -rw-r--r-- 1 root 320k Nov 22 11:19 db.docs.index > -rw-r--r-- 1 root 320k Nov 20 18:21 db.docs.index.work > -rw-r--r-- 1 root 16k Nov 22 11:16 db.words.db.work_weakcmpr > -rw-r--r-- 1 root 16k Nov 22 11:42 db.words.db_weakcmpr > > I used the updatedig sample of the 3.2.0b4 and after the dig phase (htdig -a - > t -vv -s -c htdig.conf) i got 2 new databases more: > > -rw-r--r-- 1 root 47M Nov 22 12:59 db.worddump (47Mb !!!) > -rw-r--r-- 1 root 7.1M Nov 22 12:59 db.docs > > the first database was an ASCII file, the second one was a DATA file (command > file filename :)) > I tried to understand the reasons why I had a file of 47Mb and I saw the -t > htdig flag. Yes, htdig -t, or htdump, will produce those last two files, which are ascii representations of the whole set of databases, which can be reloaded into a set of databases on anther machine using htload. They're huge because they're uncompressed ASCII text. > But on Htdig site there was something about db.wordlist database > (or ASCII file) and nothing about these ones I got :(( BTW in the updatedig > script there was a move command about db.wordlist.old in db.wordlist, but I > never found these files. The db.wordlist file is from the 3.1.x series, not 3.2.x. Your updatedig script is probably not updated correctly for 3.2. Have a look at the contrib/examples/rundig.sh script in your 3.2.0b4 source snapshot for an example of a working update script. (Hmm. I just noticed it's missing a command to copy the .work_weakcmpr file, though, so you'd need to fix that.) > So I ask you more info about them...and ...Do I really need a 47Mb-file?....Do > I misunderstand or i need the -a -c flags only to update my databases? You can safely get rid of the -t option if you don't want the ASCII dumps. You don't need these normally, and now with the htdump command, you can get them any time you want from the existing databases. The -v and -s options are of course also optional, and often only needed for troubleshooting. > 3. I modified the updatedig script to have a report of the updating every time. > > The first report has a lot of "not changed" and few "changed" > (...and "pushing"). The second and the following reports were totally > different. They looked like the rundig report...no more changed/not > changed....just... ... > Someone could explain me why? Why I don't get simply changed(--> pushing)/not > changed in my report? It seems to me that your updatedig script isn't managing the .work files correctly, so htdig ends up reindexing from scratch. htdig -a needs to have all the .work files in place in order to do an update dig, so the script needs either to leave these copies around, or copy them before running htdig -a. > 4. Do I need a purge phase between digging and merging in the updatedig script? > > In my Update Report I got a lot of "Not found: > http://www.unina.it/universit/....... Ref: http://www.unina.it/universit/....". > Do I need to purge all these references? Yes, again it seems you're running an outdated script. You need the htpurge command after htdig. You don't need htmerge unless you're merging two databases together. That's all htmerge does now since 3.2.0b3. > 5. I scheduled rundig to be execute 1 time a month and updatedig to be executed > > 1 time a day. I executed manually the first "rundigging"...with the database > dir totally empty. When the cron will execute rundig again, it'll reindex from > scratch, but do I need to remove in advance the actual database? Or htdig will > delete them for me and then it'll rebuild them again? With my scheduling, > there will be a day in which rundig and updatedig will run together (one after > other). Do I really need to schedule updatedig soon after rundig? ...or I could > don't run it in that day? You don't need to run an update soon after a full reindexing, but it may make your cron scheduling a bit complicated to avoid this unnecessary update step. A better way might be to put a hook right in the update script to force a full update once per month. There are a number of different tricks you could use to do this, but it would depend a lot on how your update script works in the first place. E.g. if you use contrib/examples/rundig.sh, which leaves copies of the .work files around for next time, you could add a hook like this before calling htdig in that script, to remove the .work files and force a full reindexing at the start of the month: case "`LC_TIME=C date`" in Sun\ ???\ \ [1-7]\ *) # remove old database on first Sunday of month rm -f $DBDIR/db.*.work $DBDIR/db.words.db.work_weakcmpr ;; esac -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

