According to Emilio:
> Hi, all.
> (3.1.6 on RedHat 7.2)
>
> Now I kwow to perform prefix search I found a problem with accents for a
> spanish site, that make all methods incompatible.
>
> Say I search for the word 'neurop�ptido' (if you see a strange letter
> between 'neurop' and 'ptido' you should interpret it as an 'e' with an
> acute tilde above).
>
> As the database is fuzzy-indexed with accents and endings, and the words in
> the original documents are accentuated (tilded, have the written accent
> on), htsearch internally looks up for the following words:
>
> neuropeptido
> neurop�ptido
> neurop�ptido (uppercase E with acute tilde above)
> neuropeptidos
> neurop�ptidos
> neurop�ptidos (uppercase E with acute tilde above)
>
> and I get the expected results. I get the same results if I search for
> 'neuropeptido', 'neurop�ptidos' or 'neuropeptidos'.
>
> But if I search (prefix search) for the word 'neurop�p*', the uppercase
> accented versions ('neurop�ptido' and 'neurop�ptidos') are not looked up in
> the index and I get less matches.
What is your locale setting in htdig.cong? If properly configured,
htdig should be mapping ALL upper-case letters, whether accented or not,
to lower-case. There should not be any distinction between � and � in
the word database, nor in search results.
> Finally, if I search for the word 'neuropep*', only nonaccentuated versions
> are looked up in the index, and I get no matches at all because
> non-accentuated versions of these words are not in any document.
Yes, this is a known limitation of htsearch. The problem is that each
fuzzy match algorithm is only applied to the original search words,
but not to the results of other fuzzy match methods (i.e. they are not
chained together). So the prefix algorithm isn't applied to extra words
found by the accents algorithm, nor the other way around.
> Due to the nature of the site, most of the words searched by visitors (as
> neurop�ptido) are not included in the spanish standard package. I supose it
> is an addiotional difficulty.
Possibly. It doesn't matter for the accents algorithm, as it's applied
to your word database, and not to a standard dictionary. However,
the endings algorithm is only applied to the words and rules defined by
your "endings_dictionary" and "endings_affix_file" attributes (i.e. the
spanish.0 and spanish.aff files). You may find it necessary to expand
this dictionary to include terminology commonly used on your site.
The other "standard" dictionary is the synonym_dictionary, which you
use to define equivalent words in your language and/or your jargon, for
use with the "synonyms" fuzzy match algorithm.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)
-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html