According to Andrew Scherpbier:
> One of things that seems to be hard to deal with is defining exactly what a
> word is. Everyone is most likely very much aware of my first attempt at
> this: valid_punctuation. :-) Well, I think a much better method would be to
> add multiple word permutations to the database. For example something like
> "D'Amore" (last name of one of my coworkers) could be entered into the
> database as "d'amore" and "amore". The problem there is the word location.
> My first thought was to give them both the same location number, but they
> really aren't the same word, so a phrase search (which would presumably need
> to be done on the *exact* words, not permutations) could possibly give
> incorrect results. Maybe a better example would be something like
> "word-source" which would be entered into the database as "word", "source",
> and "word-source". What are the locations for those words, then?
Just to give a few examples I thought of, I think all these phrases should
be treated as equivalent in a phrase search:
Linux User Group
Linux Users Group
Linux User's Group
Linux Users' Group
Linux User-Group
Also, any of these ought to match the same word:
cooperation
co-operation
co�peration (diaeresis, seldom used in English, but valid
nonetheless)
Note that in the case of user-group, and a lot of hyphenated compound
words, you want to treat the words separately, but in some cases, the
hyphen should be ignored and the whole compound word treated as a single
word. E.g. electro-physiology = electrophysiology, post-doctoral =
postdoctoral, but: activity-dependent = activity dependent, full-time =
full time.
I think the only way to deal with these consistently would be to enter
the individual words, and their concatenation, separately into the
database. So word-source should be entered as word, source, wordsource,
and possibly word-source, depending on how htsearch will deal with the
hyphen.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.