Re: [htdig-dev] Re: Accents, endings and chaining

Neal Richter Mon, 07 Jun 2004 12:05:47 -0700

> Greetings Dominiqu,
>
> I have tried to reproduce your problem (as I understood it), but
> can't.  Several possibilities come to mind:
>   1. You are (as Gilles suggested) relying on the fuzzy rule "accents"
>      rather than explicitly entering the accent into the query.  In
>      this case, you are out of luck.
>   2. Your  endings_dictionary  file doesn't contain the words with
>      actual accents.
>   3. Your  endings_dictionary  has the accents, but encoded as
>      multi-byte unicode sequences.  Currently, ht://Dig doesn't
>      support unicode.
> In either case 2 or case 3, the solution is to replace the entries in
> your  endings_dictionary  file with the single-byte latin1 (not
> unicode) accents.


  Here are two possible approaches:

1) Strip accents from all stored words & queries.  This is a fairly common
practice in search engines & NLP systems.  The obvious dissadvantage is
that a user can't restrict results to contain that specific accent... they
get back results with all of the different accents for a 'base letter'.

2) Store BOTH the accented word & unaccented/stripped word in the
db.words.db.  Silently augment each search query with the stripped version
of each word.
  This steps around the dissadvantage of #1 and still get the
'generalization' of stripped accents.

Thanks

Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485




-------------------------------------------------------
This SF.Net email is sponsored by: GNOME Foundation
Hackers Unite!  GUADEC: The world's #1 Open Source Desktop Event.
GNOME Users and Developers European Conference, 28-30th June in Norway
http://2004/guadec.org
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Re: [htdig-dev] Re: Accents, endings and chaining

Reply via email to