On Thursday 20 February 2003 02:19, Gilles Detillieux wrote: > According to Lachlan Andrew: > > I hadn't realised it, but the > > valid_punctuation attribute seems to be treated as an *optional* > > word break. (The docs say it is *not* a word break) > > I guess the docs haven't kept up with what the code does. > this functionality was extended to also index each word part, > so that something like "post-doctoral" gets indexed as > postdoctoral, post and doctoral. This greatly enhances searches for > compound words, or parts thereof, but it tends to break down when > you're indexing something that's not really words...
Thanks for that clarification Gilles. Would it be better to convert queries for post-doctoral into the phrase "post doctoral" in queries, and simply the words post and doctoral in the database? As it stands, a search for "the non-smoker" will match "the smoker", since all the words are given the same position in the database. It also reduces the size of the database (marginally in most cases, but significantly for pathological documents). Now that there is phrase searching, is there any benefit of the current approach? If not, we could do away with valid_punctuation entirely (after 3.2.0b5). > if you're going to feed a bunch of C code into htdig, you > should probably do so with a severely stripped down setting of > valid_punctuation.... However, > if the underlying word database is solid, then it shouldn't fall > apart no matter how much junk you throw at it. the root > cause of the trouble seems to be a bug somewhere in the code. My thoughts exactly. I'm only using this page for debugging... Cheers, Lachlan ------------------------------------------------------- This SF.net email is sponsored by: SlickEdit Inc. Develop an edge. The most comprehensive and flexible code editor you can use. Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial. www.slickedit.com/sourceforge _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
