[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043031#comment-16043031
 ] 

Jan Rasehorn edited comment on SOLR-6492 at 6/8/17 5:50 PM:
------------------------------------------------------------

The issue with the approach above is, that for small texts a different language 
might be determined. This means the same words are stemmed differently by the 
query analyzer compared to the index analyzer. 
So I chose another strategy for the query analyzer.
I simply create copies of the query tokens and add a language payload for the 
languages I want to support. After that I apply the same approach as used in 
the index analyzer to call the appropriate stemmers for the different languages 
using my "DelegatingFilter". So in the end the query tokens will be copied and 
stemmed by different stemmers independently which language the query tokens 
actually belong to.


was (Author: jan rasehorn):
The issue with the approach above is, that for small texts a different language 
might be determined. This means the same words are stemmed differently by the 
query analyzer compared to the index analyzer. 
So I chose another strategy for the query analyzer.
I simply create copies of the query tokens and add a language payload for the 
languages I want to support. After that I apply the same approach as done 
during in the index analyzer to call the appropriate stemmers for the different 
languages using my "DelegatingFilter". So in the end the query tokens will be 
copied and stemmed by different stemmers independently which language the query 
tokens actually belong to.

> Solr field type that supports multiple, dynamic analyzers
> ---------------------------------------------------------
>
>                 Key: SOLR-6492
>                 URL: https://issues.apache.org/jira/browse/SOLR-6492
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Trey Grainger
>             Fix For: 5.0
>
>
> A common request - particularly for multilingual search - is to be able to 
> support one or more dynamically-selected analyzers for a field. For example, 
> someone may have a "content" field and pass in a document in Greek (using an 
> Analyzer with Tokenizer/Filters for German), a separate document in English 
> (using an English Analyzer), and possibly even a field with mixed-language 
> content in Greek and English. This latter case could pass the content 
> separately through both an analyzer defined for Greek and another Analyzer 
> defined for English, stacking or concatenating the token streams based upon 
> the use-case.
> There are some distinct advantages in terms of index size and query 
> performance which can be obtained by stacking terms from multiple analyzers 
> in the same field instead of duplicating content in separate fields and 
> searching across multiple fields. 
> Other non-multilingual use cases may include things like switching to a 
> different analyzer for the same field to remove a feature (i.e. turning 
> on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to