Hi All,
We are having an ES cluster which is used to index large amount of data and
that too with different languages. So as of now our current settings was
pointing to English analyzer, and English hunspell but how we can achieve to
index multilingual data along with Multi lingual analyzer and hunspell setup
for same index (as I came across like there is a plugin called Elasticsearch
Langdetect Plugin (https://github.com/jprante/elasticsearch-langdetect)
available from ES 1.2.1).
Current analyzer setting is like:
index :
analysis :
analyzer :
synonym :
tokenizer : whitespace
filter : [synonym]
default_index :
type : custom
tokenizer : whitespace
filter : [ standard, lowercase,hunspell_US]
default_search :
type : custom
tokenizer : whitespace
filter : [standard, lowercase, synonym,hunspell_US]
filter :
synonym :
type : synonym
ignore_case : true
expand : true
synonyms_path : synonyms.txt
hunspell_US :
type : hunspell
locale : en_US
dedup : false
ignore_case : true
So here,
1) Can we configure multilingual analyzer, hunspell for same index and then
index data by configuring lang detect plugin for specific fields. So here
whether data will be indexed and analyzed as per the language analyzer
mentioned? And also will it be searchable as per multiple hunspell
dictionaries and synonyms configured as well:
Confirm if below settings can be ued to achieve the same:
analyzer :
synonym :
tokenizer : whitespace
filter : [synonym]
default_index :
type : custom
tokenizer : whitespace
filter : [ standard,
lowercase,hunspell_US,hunspell_IN,hindi,english]
default_search :
type : custom
tokenizer : whitespace
filter : [standard, lowercase,
synonym,hunspell_US,hunspell_IN,hindi,english]
filter :
hindi:
tokenizer: standard
filter: [lowercase]
english:
tokenizer: standard
filter: [lowercase]
synonym :
type : synonym
ignore_case : true
expand : true
synonyms_path : synonyms.txt
hunspell_US :
type : hunspell
locale : en_US
dedup : false
ignore_case : true
hunspell_IN :
type : hunspell
locale : hi_IN
dedup : false
ignore_case : true
After that Say, I have configured lang detect plugin and indexed some data
with different language English and hindi. So as I have configured multiple
language analyzer, MultiLingual hunspell so will I be able to perform the
index and search wrt different language as with different analyzer and get
the data as per analyzed tokens for different languages.
Also whether synonym will also work with different languages?
~Prashant
--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Use-case-of-multiple-Language-Analyzer-Hunspell-along-with-Elasticsearch-Langdetect-Plugin-tp4063950.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1411556233114-4063950.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.