Hi,
I have a Lucene index containing documents written in different languages.
Each document is written only in one language and I have a *language* field
containing the corresponding language identifier (it, en, fr, ...).
The *content* is saved in different fields for each language (e.g.
contents_it, contents_en, ...) and I use a specific language analyzer for
each of these field.
When the user inputs a query it selects also the language he is using to
write the query so I can create a *QueryParser* choosing the right *
defaultField* and* analyzer.*
*
*
This works fine, but, using this approach, users can find only documents
written in the same language used to write the query.
Now, I would like to *translate* user query in order to find also documents
written in different languages (that match the same query).
For example:
* *user_query =* cane *query_language* = it
* In this moment, using standard *QueryParser* I obtain this query --> *
contents_it:cane*
* In the new scenario, I would like to have this query -->
(*contents_it:cane
contents_en:dog contents_fr:chien*)
but also
* *user_query* = +"operating system" -linux *query_language* = en
* I would like to have this query --> *+(contents_en:"operating system" *
*contents_it:"sistema operativo"**) -(contents_en:linux **contents_it:linux*
*)*
*
*
Suppose that:
* for each index/application I have a fixed number of available languages,
each with its *defaultField* and specific *analyzer.*
* I already have a service that is able to translate words and/or small
phrases between languages I am interested in.
I was thinking about extending *QueryParser* overriding some methods to add
my custom behaviour.
This looks quite easy for TermQuery, for example doing something like this:
protected Query newTermQuery(Term term){
BooleanQuery bq = new BooleanQuery();
bq.add(new BooleanClause(new TermQuery(term),
BooleanClause.Occur.SHOULD));
*for each language except queryLanguage *{
TermQuery translatedTQ = translateTerm(term, queryLanguage,
language);
bq.add(new BooleanClause(translatedTQ,
BooleanClause.Occur.SHOULD));
* *}
return bq;
}
But it looks quite more difficult for other query types (without *rewriting
QueryParser* instead of extending it).
Am I missing something? Is there a better approach to achieve the same goal?
I am using *lucene 3.0.3* and, for now, I cannot upgrade to more recent
versions.
Thanks in advance,
Bye.
*Raf*