Hi,
First I'll admit to working with htdig for only a few weeks, so I apologize
if this has been asked before (I tried searching the mail list archives and
didn't find anything -- the search function for the 'general' list is
broken).
I've ported htdig 3.2.0b3 to Compaq Tru64 UNIX (with difficulty -- I'll be
submitting diffs once I get b4 to build).
Using the provided rundig script, I was having a problem indexing a certain
set of web pages (but not others). The first time indexing, no problem, but
on subsequent runs htpurge would dump core with messages like this:
FATAL ERROR:WordDBPage::~WordDBPage: page not empty
FATAL ERROR at file:WordDBPage.h line:484 !!!
Debugging I found that it seemed to be reading bogus data from the database.
I notice that in htdig.cc, if the -i (initial) option is set it unlinks all
the database files *except* for db.words.db_weakcmpr ! Why? Is this an
oversight? I found that if I remove this file before rerunning htdig (or
rundig) everything works fine. Also, if I modify main() in htdig.cc to
unlink() it in the initial case, everything works fine, too. Context diff
of this change is included below.
Could someone tell me what db.words.db_weakcmpr is for? I haven't been able
to figure it out yet.
Thanks,
Peter
============
Peter Derr
Compaq Tru64 UNIX Internet Engineering Group
Tel: 01.603.884.2977
[EMAIL PROTECTED]
htdig/htdig.cc diffs:
***************
*** 255,263 ****
filename.get()));
}
! const String word_filename = config["word_db"];
if (initial)
unlink(word_filename);
// Initialize htword
WordContext::Initialize(config);
--- 255,267 ----
filename.get()));
}
! String word_filename = config["word_db"];
if (initial)
+ {
unlink(word_filename);
+ word_filename += (const char *)"_weakcmpr";
+ unlink(word_filename);
+ }
// Initialize htword
WordContext::Initialize(config);
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html