Greetings, This should be a lesson to me about posting bug reports from memory... (One day I'll get my development machine on line :)
On Fri, 29 Nov 2002 10:09, Lachlan Andrew wrote: > - Phrase searching doesn't work if the phrase contains a > stop word. A search for "Something is happening" will > fail, because "is" will be ignored. (I haven't yet > checked if it matches a document containing "Something > happening".) It turns out that *short* words are handled correctly (ignored by htdig and htsearch), but "bad" words are not. They are not inserted into the inverted file, but they *are* counted by HTML.cc (and presumably by other parsers) when determining the location of subsequent words. The backward-compatible fix is to fix htsearch (a five line patch) but the more elegand solution would be to change htdig to treat the two consistently. My preference would be to specify the *true* location of all words, even counting short words. That way we could distinguish between a true phrase match and a "phrase match, but possibly with short words in between". However, we completely ignoring both short and "bad" words would be a step in the right direction. > - I think I may have broken the code to stop "index.html" > being stripped from "file:///" URLs. I'm not sure if the > bug has been committed yet, but my next patch will fix > it. That was my bad memory of the testing I'd done. All is well on that count. Cheers, Lachlan -- Lachlan Andrew Phone: +613 8344-3816 Fax: +613 8344-6678 Dept of Electrical and Electronic Engg CRICOS Provider Code University of Melbourne, Victoria, 3010 AUSTRALIA 00116K ------------------------------------------------------- This SF.net email is sponsored by: Get the new Palm Tungsten T handheld. Power & Color in a compact size! http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev