Here are the news:

I reindexed with the debugger (learned how to do it in the process) 
using a smaller subset and everything worked.

I now am reindexing my whole data and it seems to me like htdig is 
stuck. Here's the info:

I launched it like this:

% bin/htdig -a -v -s -c conf/myconf.conf

sending the output to /var/log/htdig.log

here is

% tail /var/log/htdig.log :

22097:24487:13:http://www.medi1.co.ma/present/animateurs/laila_an_nom.htm: 
-********--******-*-- size = 6524
22098:24486:13:http://www.medi1.co.ma/present/animateurs/montassir.htm: 
-********--******-*-- size = 6523
22099:24485:13:http://www.medi1.co.ma/present/animateurs/nawal.htm: 
-********--******-*-- size = 6519
22100:24484:13:http://www.medi1.co.ma/present/animateurs/jalil_n.htm: 
-********--******-*-- size = 6519
22101:24483:13:http://www.medi1.co.ma/present/animateurs/bouchra_nom.htm: 
-********--******-*-- size = 6523
22102:24467:13:http://www.medi1.co.ma/present/journalistes/benseghir_n.htm: 
-********--******-*-- size = 6535
22103:24466:13:http://www.medi1.co.ma/present/journalistes/sbai_n.htm: 
-********--******-*-- size = 6530
22104:24476:13:http://www.medi1.co.ma/present/journalistes/fontana_n.htm: 
-********--******-*-- size = 6533
22105:24464:13:http://www.medi1.co.ma/present/journalistes/fourt_n.htm: 
-********--******-*-- size = 6531
22106:24463:13:http://www.medi1.co.ma/present/journali

% ls -l /var/log/htdig.log
-rw-rw-rw-  1 root  wheel  2105344 Oct  9 16:16 /var/log/htdig.log

% ps -auwx|grep htdig
root   21936  40.2  4.4    31368  25868 std- R    3605:49.74 
bin/htdig -a -v -s -c conf/myconf.conf

% top

Processes:  73 total, 3 running, 70 sleeping... 223 threads            09:29:21
Load Avg:  1.56, 1.70, 1.73     CPU usage:  20.4% user, 79.6% sys, 0.0% idle
SharedLibs: num =  103, resident = 18.8M code, 640K data, 5.43M LinkEdit
MemRegions: num = 5526, resident =  152M + 3.93M private, 15.9M shared
PhysMem:  60.5M wired,  197M active,  282M inactive,  540M used, 36.4M free
VM:  910M + 50.7M   40879(0) pageins, 181(0) pageouts

   PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
21936 htdig       66.0% 95:28:29   1    16    29  24.4M  2.27M  25.2M  30.6M

% cd db
% ll
total 246424
drwxrwxr-x  12 root  admin       364 Oct  5 20:13 .
drwxrwxr-x  11 root  admin       330 Oct  5 18:48 ..
-rw-r--r--   1 root  admin     98304 Oct  5 19:52 db.docdb
-rw-r--r--   1 root  admin   5480448 Oct  6 20:07 db.docdb.work
-rw-r--r--   1 root  admin     49152 Oct  5 19:52 db.docs.index
-rw-r--r--   1 root  admin   2670592 Oct  6 20:07 db.docs.index.work
-rw-r--r--   1 root  admin   1417216 Oct  5 19:52 db.excerpts
-rw-r--r--   1 root  admin  73809920 Oct  6 20:07 db.excerpts.work
-rw-r--r--   1 root  admin    619520 Oct  5 19:53 db.words.db
-rw-r--r--   1 root  admin  41784320 Oct  6 20:06 db.words.db.work
-rw-r--r--   1 root  admin     16384 Oct  5 20:13 db.words.db.work_weakcmpr
-rw-r--r--   1 root  admin     16384 Oct  5 19:53 db.words.db_weakcmpr

Should I kill it and restart the dig for it to continue ? Am I going 
to corrupt my db ? The log file is not moving anymore (although the 
mod date is recent for an unknown reason)

Thanks for your time, patience and help

I am writing this on Oct 9 at 16:30

>According to Franck Horlaville:
>>  My dig is still not working as it should, i.e. all searches return nothing.
>>
>>  Using ht://Dig 3.2.0b4-081901 on MacOS X 10.0.4 (Darwin)
>>
>>  I re-indexed everything using -i and this is what I got at the end 
>>of the log :
>>
>>  12815:13793:6:http://www.mfie.gov.ma/di/Documentation/doc_generale.htm:
>>  ++ size = 1127
>> 
>>12816:13800:6:http://www.mfie.gov.ma/di/textes/enregistrement/timbre/sommaire2.htm:
>>  ++*++**********Deleted, not found: ID: 403 URL:
>>  http://www.marocnet.net.ma/ve4130/p11.html
>
>There seems to be an abrupt transition above from htdig to htpurge messages.
>It seems to me that htdig may have died suddenly.
>
>>  Deleted, no excerpt: ID: 550 URL: 
>>http://www.marocnet.net.ma/ve3899/p29.html
>  > (...)
>>  Deleted, no excerpt: ID: 5391 URL:
>>  http://www.mincom.gov.ma/news/2000/2401to2901/mis/mis.htm
>>  htpurge: 10
>>  Deleted, no excerpt: ID: 6344 URL:
>>  http://www.mcinet.gov.ma/mciweb/Ti/CTNTI_Cadre.htm
>>  Deleted, no excerpt: ID: 6491 URL:
>>  http://www.mcinet.gov.ma/mciweb/LeMinistere/DQN_DL.htm
>>  (...)
>>  Deleted, no excerpt: ID: 12578 URL:
>>  http://www.mfie.gov.ma/marches/edaag/Preselection.htm
>>  Deleted, not found: ID: 12983 URL:
>>  http://www.mfie.gov.ma/almalya/maliya25/p34.html
>>  htpurge: 50
>>
>>  If I retry running htpurge, I get the exact same end, so it's not
>>  like the program got killed in the middle.
>
>htpurge may not have been, but htdig may have.
>
>>  Here are my db sizes:
>>  root# ll
>>  total 264760
>>  drwxrwxr-x  7 root  admin       264 Sep 12 02:12 .
>>  drwxrwxr-x  9 root  admin       262 Aug 24 23:19 ..
>>  -rw-r--r--  1 fh    admin   2727936 Sep 12 02:12 db.docdb
>>  -rw-r--r--  1 fh    admin   1327104 Sep 12 02:12 db.docs.index
>>  -rw-r--r--  1 fh    admin  52641792 Sep 12 02:12 db.excerpts
>>  -rw-r--r--  1 fh    admin  78840832 Sep 12 02:12 db.words.db
>>  -rw-r--r--  1 fh    admin     16384 Sep  5 08:49 db.words.db_weakcmpr
>>
>>  root# ./htstat -vv -u -c ../conf/myconf.conf
>>  htstat: Total documents: 198
>>  htstat: URLs in database:
>>           http://www.oncf.org.ma/lettrecommerciale3.htm
>>           http://www.marocnet.net.ma/ve4016/
>>           http://www.marocnet.net.ma/ve4033/p25.html
>>  (...)
>>           http://www.mfie.gov.ma/db/lf2000-1s/tableaux/francais/tabl-b2.htm
>>  htstat: Total words: 0
>>  Bus error
>
>This may be due to a corrupt database because of htdig crashing, so the
>first order of business would be to find out if/why htdig doesn't run
>to completion.  In any case, though, it might be useful to know where
>in htstat the bus error occurs, so that a more useful error check could
>be put in place there.  Could you run it under the debugger and get
>a backtrace?
>
>--
>Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
>Spinal Cord Research Centre       WWW: 
>http://www.scrc.umanitoba.ca/~grdetil
>Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
>Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930


-- 
Franck Horlaville

Athena Online
+212 (0) 37 68 28 08
http://www.athena.online.co.ma/
mailto:[EMAIL PROTECTED]

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to