According to =?ISO-8859-1?Q?Am=E9lie_Frenette?=:
> I have asked you questions a couple of time and I thank you much for your 
> work ! Another challenge is following...
> 
> search characters "l'" gives no results (it is in the stop words list)
> search characters "l'asthme" 55 results
> search characters "asthme" 111 results 
> 
> I wonder if you can help me with this one ?

Well, that depends on what the problem is.  Note that htdig's bad_words
list is not a list of stop words in the sense that htdig would stop
indexing if it encounters one of these words.  It merely suppresses
these words from the search database, and from the search query.

So, given that, what you describe above doesn't sound like a problem
to me.  If the apostrophe is in valid_punctuation, as it is by default,
then when htdig encounters a word like "l'asthme" it will try to index
the whole word, stripped of punctuation, plus any component parts of it.
In this case, the "l" is suppressed, so only "asthme" and "lasthme" go
into the word database.  htsearch doesn't do the splitting, however,
so a query for "l'asthme" would only match "lastme" in the database,
while a query for "asthme" would match "asthme" in the database,
whether it came from the word by itself, or from "l'asthme" after the
"l'" was stripped off.  So, it's logical that it would get more matches.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to