Greetings,

This should be a lesson to me about posting bug reports 
from memory...  (One day I'll get my development machine on 
line :)

On Fri, 29 Nov 2002 10:09, Lachlan Andrew wrote:

> - Phrase searching doesn't work if the phrase contains a
> stop word.  A search for "Something is happening" will
> fail, because "is" will be ignored.  (I haven't yet
> checked if it matches a document containing "Something
> happening".)

It turns out that *short* words are handled correctly 
(ignored by htdig and htsearch), but "bad" words are not.  
They are not inserted into the inverted file, but they 
*are* counted by  HTML.cc  (and presumably by other 
parsers) when determining the location of subsequent words. 
 The backward-compatible fix is to fix  htsearch  (a five 
line patch) but the more elegand solution would be to 
change  htdig  to treat the two consistently.  My 
preference would be to specify the *true* location of all 
words, even counting short words.  That way we could 
distinguish between a true phrase match and a "phrase 
match, but possibly with short words in between". However, 
we completely ignoring both short and "bad" words would be 
a step in the right direction.

> - I think I may have broken the code to stop "index.html"
> being stripped from "file:///" URLs.  I'm not sure if the
> bug has been committed yet, but my next patch will fix
> it.

That was my bad memory of the testing I'd done.  All is 
well on that count.

Cheers,
Lachlan

-- 
Lachlan Andrew  Phone: +613 8344-3816 Fax: +613 8344-6678
Dept of Electrical and Electronic Engg          CRICOS Provider Code
University of Melbourne, Victoria, 3010  AUSTRALIA      00116K


-------------------------------------------------------
This SF.net email is sponsored by: Get the new Palm Tungsten T 
handheld. Power & Color in a compact size! 
http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to