Re: Use case of multiple Language Analyzer, Hunspell along with Elasticsearch Langdetect Plugin

[email protected] Wed, 24 Sep 2014 08:56:55 -0700

The issue tracker address:

https://github.com/jprante/elasticsearch-langdetect/issues


Jörg


On Wed, Sep 24, 2014 at 5:55 PM, [email protected] <
[email protected]> wrote:

> With langdetect plugin, there is ujst a field "lang" mapped under the
> string field that is used for detection, and in this field the languages
> codes are written. This is useful for e.g. aggregations or filtering
> documents by language.
>
> At the moment it is not possible to use something like for example a
> dynamic "copy_to" to duplicate the field after detection to a field with
> language-specific analyzer like a synonym analyzer.
>
> A feature request at the issue tracker at github is much appreciated so I
> can have a look into this.
>
> Jörg
>
>
> On Wed, Sep 24, 2014 at 12:57 PM, Prashant Agrawal <
> [email protected]> wrote:
>
>> Hi All,
>>
>> We are having an ES cluster which is used to index large amount of data
>> and
>> that too with different languages. So as of now our current settings was
>> pointing to English analyzer, and English hunspell but how we can achieve
>> to
>> index multilingual data along with Multi lingual analyzer and hunspell
>> setup
>> for same index (as I came across like there is a plugin called
>> Elasticsearch
>> Langdetect Plugin (https://github.com/jprante/elasticsearch-langdetect)
>> available from ES 1.2.1).
>>
>> Current analyzer setting is like:
>> index :
>>   analysis :
>>       analyzer :
>>         synonym :
>>             tokenizer : whitespace
>>             filter : [synonym]
>>         default_index :
>>             type : custom
>>             tokenizer : whitespace
>>             filter : [ standard, lowercase,hunspell_US]
>>         default_search :
>>             type : custom
>>             tokenizer : whitespace
>>             filter : [standard, lowercase, synonym,hunspell_US]
>>       filter :
>>         synonym :
>>             type : synonym
>>             ignore_case : true
>>             expand : true
>>             synonyms_path : synonyms.txt
>>         hunspell_US :
>>             type : hunspell
>>             locale : en_US
>>             dedup : false
>>             ignore_case : true
>>
>>
>> So here,
>> 1) Can we configure multilingual analyzer, hunspell for same index and
>> then
>> index data by configuring lang detect plugin for specific fields. So here
>> whether data will be indexed and analyzed as per the language analyzer
>> mentioned? And also will it be searchable as per multiple hunspell
>> dictionaries and synonyms configured as well:
>>
>> Confirm if below settings can be ued to achieve the same:
>>
>>       analyzer :
>>         synonym :
>>             tokenizer : whitespace
>>             filter : [synonym]
>>         default_index :
>>             type : custom
>>             tokenizer : whitespace
>>             filter : [ standard,
>> lowercase,hunspell_US,hunspell_IN,hindi,english]
>>         default_search :
>>             type : custom
>>             tokenizer : whitespace
>>             filter : [standard, lowercase,
>> synonym,hunspell_US,hunspell_IN,hindi,english]
>>       filter :
>>         hindi:
>>           tokenizer:  standard
>>           filter: [lowercase]
>>         english:
>>           tokenizer:  standard
>>           filter: [lowercase]
>>         synonym :
>>             type : synonym
>>             ignore_case : true
>>             expand : true
>>             synonyms_path : synonyms.txt
>>         hunspell_US :
>>             type : hunspell
>>             locale : en_US
>>             dedup : false
>>                         ignore_case : true
>>         hunspell_IN :
>>             type : hunspell
>>             locale : hi_IN
>>             dedup : false
>>                         ignore_case : true
>>
>>
>> After that Say, I have configured lang detect plugin and indexed some data
>> with different language English and hindi. So as I have configured
>> multiple
>> language analyzer, MultiLingual hunspell so will I be able to perform the
>> index and search wrt different language as with different analyzer and get
>> the data as per analyzed tokens for different languages.
>>
>> Also whether synonym will also work with different languages?
>>
>> ~Prashant
>>
>>
>>
>> --
>> View this message in context:
>> http://elasticsearch-users.115913.n3.nabble.com/Use-case-of-multiple-Language-Analyzer-Hunspell-along-with-Elasticsearch-Langdetect-Plugin-tp4063950.html
>> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/1411556233114-4063950.post%40n3.nabble.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGEtcKLiA5oR6BRSvxdGGB7qmXx5iFoY-2dw2RHg%2BrX4g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Use case of multiple Language Analyzer, Hunspell along with Elasticsearch Langdetect Plugin

Reply via email to