Use case of multiple Language Analyzer, Hunspell along with Elasticsearch Langdetect Plugin

Prashant Agrawal Wed, 24 Sep 2014 03:57:35 -0700

Hi All,

We are having an ES cluster which is used to index large amount of data and
that too with different languages. So as of now our current settings was
pointing to English analyzer, and English hunspell but how we can achieve to
index multilingual data along with Multi lingual analyzer and hunspell setup
for same index (as I came across like there is a plugin called Elasticsearch
Langdetect Plugin (https://github.com/jprante/elasticsearch-langdetect)
available from ES 1.2.1).


Current analyzer setting is like:
index :
  analysis :
      analyzer :
        synonym : 
            tokenizer : whitespace
            filter : [synonym]
        default_index :
            type : custom
            tokenizer : whitespace
            filter : [ standard, lowercase,hunspell_US]  
        default_search :
            type : custom
            tokenizer : whitespace
            filter : [standard, lowercase, synonym,hunspell_US]  
      filter :
        synonym : 
            type : synonym
            ignore_case : true
            expand : true
            synonyms_path : synonyms.txt
        hunspell_US :
            type : hunspell
            locale : en_US 
            dedup : false
            ignore_case : true


So here,
1) Can we configure multilingual analyzer, hunspell for same index and then
index data by configuring lang detect plugin for specific fields. So here
whether data will be indexed and analyzed as per the language analyzer
mentioned? And also will it be searchable as per multiple hunspell
dictionaries and synonyms configured as well:

Confirm if below settings can be ued to achieve the same:

      analyzer :
        synonym : 
            tokenizer : whitespace
            filter : [synonym]
        default_index :
            type : custom
            tokenizer : whitespace
            filter : [ standard,
lowercase,hunspell_US,hunspell_IN,hindi,english]  
        default_search :
            type : custom
            tokenizer : whitespace
            filter : [standard, lowercase,
synonym,hunspell_US,hunspell_IN,hindi,english]  
      filter :
        hindi: 
          tokenizer:  standard
          filter: [lowercase]
        english: 
          tokenizer:  standard
          filter: [lowercase]
        synonym : 
            type : synonym
            ignore_case : true
            expand : true
            synonyms_path : synonyms.txt
        hunspell_US :
            type : hunspell
            locale : en_US 
            dedup : false
                        ignore_case : true
        hunspell_IN :
            type : hunspell
            locale : hi_IN 
            dedup : false
                        ignore_case : true

                        
After that Say, I have configured lang detect plugin and indexed some data
with different language English and hindi. So as I have configured multiple
language analyzer, MultiLingual hunspell so will I be able to perform the
index and search wrt different language as with different analyzer and get
the data as per analyzed tokens for different languages.

Also whether synonym will also work with different languages?

~Prashant



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Use-case-of-multiple-Language-Analyzer-Hunspell-along-with-Elasticsearch-Langdetect-Plugin-tp4063950.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1411556233114-4063950.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

Use case of multiple Language Analyzer, Hunspell along with Elasticsearch Langdetect Plugin

Reply via email to