Re: htdig: Possible htsearch bug

George Adams Mon, 23 Nov 1998 13:06:10 -0500

> Do you have "remove_bad_urls" set? Since this is now 
> a "bad url" it won't be removed unless this option
> is set.

Yes, I have added "remove_bad_urls: true" to my htdig.conf .

Let me clarify my setup, just in case it helps explain the problem better:

TEST #1
1)  foo.html contains a link to bar.html.  A search for
a keyword which appears in bar.html (and nowhere else on the site) works as expected.

2)  bar.html is deleted.  foo.html now contains a link
to a nonexistent file.  When "rundig" is run, the missing file is noticed and removed, 
and a warning message about the "not found" file is generated.

TEST #2
1)  foo.html again contains a link to bar.html.  A search for a keyword in bar.html 
works as expected.

2)  bar.html is deleted AND the link to bar.html is removed from foo.html .  When 
"rundig" is rerun, no warnings are generated - however, the total number of indexed 
documents is now 1 fewer than what it used to be.

In both test cases, after step 2), searching for the keyword that used to appear in 
bar.html causes the bogus search result screen to appear:

     Documents 1 - 1 of 1 matches. More *'s 
     indicate a better match. 

followed by a blank page.  

---------------

Again, I've found that blowing away the htdig/db directory before rerunning "rundig" 
fixes the problem.  

A grep of db/db.words.db shows the keyword from the now-deleted bar.html is still in 
the wordlist - could this be why htsearch still thinks one page matches the search 
criteria?

Is db.words.db NOT one of the files that gets erased when "rundig" runs "htdig -i" ?


-----== Sent via Deja News, The Discussion Network ==-----
http://www.dejanews.com/  Easy access to 50,000+ discussion forums
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.
Re: htdig: Possible htsearch bug

Reply via email to