There's someone who could help me plz?

Thx

----- Messaggio inoltrato da [EMAIL PROTECTED] -----
Date: Sat, 24 Nov 2001 03:56:35 +0100 (CET)
From: [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]
Subject: [htdig] >>> Databases, rundig & updatedig
To: HtDig General Mailing list <[EMAIL PROTECTED]>

Hallo,
I installed the 3.2.0b4 and there's few things that I can't understand about 
htdig databases. BTW I use the standard htdig.conf without any special 
parameter for the databases.

1. What's the use of db.words.db_weakcmpr?

I used the rundig sample of the 3.2.0b4 and there isn't any "move" in it about 
this database. When I ran the script, everything seemed to work but the engine 
didn't find anything during the searches. :((
I had some suspects, so I renamed the db.words.db.work_weakcmpr in 
db.words.db_weakcmpr and...automagically everything really worked :)))
Plz someone modify that script :))

2. What are the databases that I really need for updating?

After running rundig I had the following databases:

total 45M
-rw-r--r--    1 root          11M Nov 22 11:19 db.words.db
-rw-r--r--    1 root          11M Nov 20 18:21 db.words.db.work
-rw-r--r--    1 root          10M Nov 22 11:19 db.excerpts
-rw-r--r--    1 root          10M Nov 20 18:21 db.excerpts.work
-rw-r--r--    1 root         712k Nov 22 11:19 db.docdb
-rw-r--r--    1 root         712k Nov 20 18:21 db.docdb.work
-rw-r--r--    1 root         320k Nov 22 11:19 db.docs.index
-rw-r--r--    1 root         320k Nov 20 18:21 db.docs.index.work
-rw-r--r--    1 root          16k Nov 22 11:16 db.words.db.work_weakcmpr
-rw-r--r--    1 root          16k Nov 22 11:42 db.words.db_weakcmpr

I used the updatedig sample of the 3.2.0b4 and after the dig phase (htdig -a -
t -vv -s -c htdig.conf) i got 2 new databases more:

-rw-r--r--    1 root          47M Nov 22 12:59 db.worddump (47Mb !!!)
-rw-r--r--    1 root         7.1M Nov 22 12:59 db.docs

the first database was an ASCII file, the second one was a DATA file (command 
file filename :))
I tried to understand the reasons why I had a file of 47Mb and I saw the -t 
htdig flag. But on Htdig site there was something about db.wordlist database 
(or ASCII file) and nothing about these ones I got :(( BTW in the updatedig 
script there was a move command about db.wordlist.old in db.wordlist, but I 
never found these files.
So I ask you more info about them...and ...Do I really need a 47Mb-file?....Do 
I misunderstand or i need the -a -c flags only to update my databases? 

3. I modified the updatedig script to have a report of the updating every time.

The first report has a lot of "not changed" and few "changed" 
(...and "pushing"). The second and the following reports were totally 
different. They looked like the rundig report...no more changed/not 
changed....just...

1616:1292:4:http://www.unina.it/universit/amministrazione/personale/mobilita.htm
l: 
title: UniNa_Amminstrazione
 size = 6811
1617:877:4:http://www.unina.it/universit/amministrazione/statistiche/dal97/medic
ina.html: 
title: UniNa_Amminstrazione
 size = 44153

...and...

1622:1928:4:http://www.unina.it/universit/didattica/economia/PERFeco_sotto.html:
 
title: UniNa_Ateneo

   pushing 
http://www.unina.it/universit/didattica/economia/PERFeco_laterale.html
+
   pushing 
http://www.unina.it/universit/didattica/economia/PERFeco_centrale.html
+ size = 871

Someone could explain me why? Why I don't get simply changed(--> pushing)/not 
changed in my report?

4. Do I need a purge phase between digging and merging in the updatedig script?

In my Update Report I got a lot of "Not found: 
http://www.unina.it/universit/....... Ref: http://www.unina.it/universit/....";.
Do I need to purge all these references?

5. I scheduled rundig to be execute 1 time a month and updatedig to be executed

1 time a day. I executed manually the first "rundigging"...with the database 
dir totally empty. When the cron will execute rundig again, it'll reindex from 
scratch, but do I need to remove in advance the actual database? Or htdig will 
delete them for me and then it'll rebuild them again? With my scheduling,  
there will be a day in which rundig and updatedig will run together (one after 
other). Do I really need to schedule updatedig soon after rundig? ...or I could

don't run it in that day?


Thank you

Pietro Palladino



_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]>
with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

----- Fine messaggio inoltrato -----

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to