Re: Use case of multiple Language Analyzer, Hunspell along with Elasticsearch Langdetect Plugin

[email protected] Wed, 24 Sep 2014 08:55:38 -0700

With langdetect plugin, there is ujst a field "lang" mapped under the
string field that is used for detection, and in this field the languages
codes are written. This is useful for e.g. aggregations or filtering
documents by language.


At the moment it is not possible to use something like for example a
dynamic "copy_to" to duplicate the field after detection to a field with
language-specific analyzer like a synonym analyzer.

A feature request at the issue tracker at github is much appreciated so I
can have a look into this.

Jörg


On Wed, Sep 24, 2014 at 12:57 PM, Prashant Agrawal <
[email protected]> wrote:

> Hi All,
>
> We are having an ES cluster which is used to index large amount of data and
> that too with different languages. So as of now our current settings was
> pointing to English analyzer, and English hunspell but how we can achieve
> to
> index multilingual data along with Multi lingual analyzer and hunspell
> setup
> for same index (as I came across like there is a plugin called
> Elasticsearch
> Langdetect Plugin (https://github.com/jprante/elasticsearch-langdetect)
> available from ES 1.2.1).
>
> Current analyzer setting is like:
> index :
>   analysis :
>       analyzer :
>         synonym :
>             tokenizer : whitespace
>             filter : [synonym]
>         default_index :
>             type : custom
>             tokenizer : whitespace
>             filter : [ standard, lowercase,hunspell_US]
>         default_search :
>             type : custom
>             tokenizer : whitespace
>             filter : [standard, lowercase, synonym,hunspell_US]
>       filter :
>         synonym :
>             type : synonym
>             ignore_case : true
>             expand : true
>             synonyms_path : synonyms.txt
>         hunspell_US :
>             type : hunspell
>             locale : en_US
>             dedup : false
>             ignore_case : true
>
>
> So here,
> 1) Can we configure multilingual analyzer, hunspell for same index and then
> index data by configuring lang detect plugin for specific fields. So here
> whether data will be indexed and analyzed as per the language analyzer
> mentioned? And also will it be searchable as per multiple hunspell
> dictionaries and synonyms configured as well:
>
> Confirm if below settings can be ued to achieve the same:
>
>       analyzer :
>         synonym :
>             tokenizer : whitespace
>             filter : [synonym]
>         default_index :
>             type : custom
>             tokenizer : whitespace
>             filter : [ standard,
> lowercase,hunspell_US,hunspell_IN,hindi,english]
>         default_search :
>             type : custom
>             tokenizer : whitespace
>             filter : [standard, lowercase,
> synonym,hunspell_US,hunspell_IN,hindi,english]
>       filter :
>         hindi:
>           tokenizer:  standard
>           filter: [lowercase]
>         english:
>           tokenizer:  standard
>           filter: [lowercase]
>         synonym :
>             type : synonym
>             ignore_case : true
>             expand : true
>             synonyms_path : synonyms.txt
>         hunspell_US :
>             type : hunspell
>             locale : en_US
>             dedup : false
>                         ignore_case : true
>         hunspell_IN :
>             type : hunspell
>             locale : hi_IN
>             dedup : false
>                         ignore_case : true
>
>
> After that Say, I have configured lang detect plugin and indexed some data
> with different language English and hindi. So as I have configured multiple
> language analyzer, MultiLingual hunspell so will I be able to perform the
> index and search wrt different language as with different analyzer and get
> the data as per analyzed tokens for different languages.
>
> Also whether synonym will also work with different languages?
>
> ~Prashant
>
>
>
> --
> View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/Use-case-of-multiple-Language-Analyzer-Hunspell-along-with-Elasticsearch-Langdetect-Plugin-tp4063950.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1411556233114-4063950.post%40n3.nabble.com
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGZDhBwBRhNdF42QZg2wiJPTwzsryCdFMLc%2BfkBesZNeA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Use case of multiple Language Analyzer, Hunspell along with Elasticsearch Langdetect Plugin

Reply via email to