question about multiple languages

Maciej Liżewski Mon, 08 Oct 2012 08:03:32 -0700

Hi,

I would like to know what is the default approach to handle multiple
languages in documents? I know that there is a component for
"update"/"extract" process that can "automagically" guess the
languages and put the language name in attribute and map field names
to "*_[lang]" (I know that this is not general solr forum, but I think
there are experienced developers)


Now there are two possibilities:
1. when fields are untouched - processing data (stemming, etc) is same
for every document, which is rather wrong because polish stemming is
different from english one... :)
2. attributes are mapped to *_lang and every *_lang field has
different processing definition (stemming, stop words, etc).

This part I understand,
but I am confused on how to perform valid queries in both cases? I
have single (simple) page which should work google-like: you enter a
text and get results. But there is no "language guess" process for
queries... Do I have to specify on each query whether it should search
in 'text_en' or 'text_pl' fields? If so - it is not very good because
I would like users to get all documents that match query no matter
what language they are written in. There are many similar words,
technical names, etc, which are same in many languages...

In other words - how to achieve google-like search with stemming for
multiple languages and without to force users to select language they
would like to search in?

question about multiple languages

Reply via email to