According to Sunny Fortune:
> My site contains documents in both English as well as
> Spanish. I would like my search to accept both english
> and spanish words and return the appropriate
> documents. 
> 
> Is there a way of indexing my site by using a single
> configuration file having locale set to "C" as well as
> "es_MX" and setting the dictionary attributes to the
> correct paths?

English is pretty easy to index with another language, because any
ISO-8859-* based locale will include the whole 7-bit ASCII set.  So,
with a locale of es_MX, htdig won't have any problems indexing English
and Spanish words (assuming the locale actually works on your system).

In general, documents of different languages can be indexed together as
long as they share the same encoding, and you have a locale that supports
that full encoding.  The problem is that the locale definitions on some
systems limit the recognized accents to a subset of the full encoding.
That's not a problem in this case because the English alphabet is a
subset of the Spanish one.  If you can index Spanish words, you can
index English ones too.

The dictionaries are a bit more complicated, but less critical.  They're
only used for fuzzy matching, using the "endings" algorithm.  If you want
to support the endings algorithm in both languages, you'd need to setup
two different configuration files and build the dictionaries separately
for the two languages, then allow user selection of one or the other
configuration file via the "config" input parameter in the search form.
(See FAQ 4.10 and 4.2 at http://www.htdig.org/FAQ.html)

It's theoretically possible, but may be a bit tedious, to actually merge
the word and affix files for two languages to make a combined endings
database for the two.  The difficulty is in resolving any conflicts in
affix definitions by relettering some of the affix codes.

If you only need to support "endings" in one language, or not at all, then
this need not be a concern either.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to