On Thursday 20 February 2003 02:19, Gilles Detillieux wrote:
> According to Lachlan Andrew:
> > I hadn't realised it, but the
> > valid_punctuation  attribute seems to be treated as an *optional*
> > word break.  (The docs say it is *not* a word break)
>
> I guess the docs haven't kept up with what the code does.
> this functionality was extended to also index each word part,
> so that something like "post-doctoral" gets indexed as
> postdoctoral, post and doctoral. This greatly enhances searches for
> compound words, or parts thereof, but it tends to break down when
> you're indexing something that's not really words...

Thanks for that clarification Gilles.

Would it be better to convert queries for  post-doctoral  into the 
phrase "post doctoral" in queries, and simply the words  post  and  
doctoral  in the database?  As it stands, a search for "the 
non-smoker" will match "the smoker", since all the words are given 
the same position in the database.  It also reduces the size of the 
database (marginally in most cases, but significantly for 
pathological documents).  Now that there is phrase searching, is 
there any benefit of the current approach?  If not, we could do away 
with  valid_punctuation  entirely (after 3.2.0b5).

> if you're going to feed a bunch of C code into htdig, you
> should probably do so with a severely stripped down setting of
> valid_punctuation.... However,
> if the underlying word database is solid, then it shouldn't fall
> apart no matter how much junk you throw at it.  the root
> cause of the trouble seems to be a bug somewhere in the code.

My thoughts exactly.  I'm only using this page for debugging...

Cheers,
Lachlan


-------------------------------------------------------
This SF.net email is sponsored by: SlickEdit Inc. Develop an edge.
The most comprehensive and flexible code editor you can use.
Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial.
www.slickedit.com/sourceforge
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to