Despite my name, I do not speak Russian. :) Please excuse my ignorance of the Russian language while I attempt to debug.
Currently, the synonym token filter is being applied after the other three token filters: "snowball_text", "lowercase", and "russian_morphology". In this case, the synonym mapping will be executing key lookups on terms that have been stemmed and lowercase (I do not know what russian_morphology provides). Try moving your synonym filter before any stemming. After lowercasing is fine, as long as your synonym map have lowercased values (or set ignore_case to true). In your example, foo/bar/baz have no further stemming, so they work as is. Cheers, Ivan On Thu, Mar 6, 2014 at 2:39 AM, Владимир Руденко <[email protected]>wrote: > Hi. > I have test index with settings: > curl -XPOST 'http://localhost:9200/test_index' -d ' > { > "settings" : { > "number_of_shards" : 5, > "language":"javascript", > "analysis": { > "filter": { > "snowball_text" : { > "type": "snowball", > "language": "Russian" > }, > "synonym" : { > "type" : "synonym", > "synonyms_path" : "synonym.txt" > } > }, > "analyzer": { > "search" : { > "type" :"custom", > "tokenizer": "standard", > "filter": ["snowball_text", "lowercase", > "russian_morphology", "synonym"] > } > } > } > }, > "mappings" : { > "test_type" : { > "properties" : { > "test" : { > "type" : "string", > "analyzer" : "search" > }, > "description" : { > "type" : "string", > "analyzer" : "search" > } > } > } > } > }' > > File synonym.txt: > продажа => купить > аренда => арендовать, сниму, снять > foo => foo bar, baz > > English words works fine: > curl -XGET ' > http://localhost:9200/test_index/_analyze?text=foo&analyzer=search&pretty=true > ' > { > "tokens" : [ { > "token" : "foo", > "start_offset" : 0, > "end_offset" : 3, > "type" : "SYNONYM", > "position" : 1 > }, { > "token" : "baz", > "start_offset" : 0, > "end_offset" : 3, > "type" : "SYNONYM", > "position" : 1 > }, { > "token" : "bar", > "start_offset" : 0, > "end_offset" : 3, > "type" : "SYNONYM", > "position" : 2 > } ] > } > > But russian: > curl -XGET ' > http://localhost:9200/test_index/_analyze?text=продажа&analyzer=search&pretty=true > ' > { > "tokens" : [ { > "token" : "タ", > "start_offset" : 3, > "end_offset" : 4, > "type" : "<KATAKANA>", > "position" : 1 > }, { > "token" : "ᄒ", > "start_offset" : 5, > "end_offset" : 6, > "type" : "<HANGUL>", > "position" : 2 > }, { > "token" : "ᄡ", > "start_offset" : 7, > "end_offset" : 8, > "type" : "<HANGUL>", > "position" : 3 > }, { > "token" : "ᄚ", > "start_offset" : 9, > "end_offset" : 10, > "type" : "<HANGUL>", > "position" : 4 > }, { > "token" : "ᄊ", > "start_offset" : 11, > "end_offset" : 12, > "type" : "<HANGUL>", > "position" : 5 > }, { > "token" : "ᄚ", > "start_offset" : 13, > "end_offset" : 14, > "type" : "<HANGUL>", > "position" : 6 > } ] > } > > I cant't understand what i'm doing wrong? > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/481aacc3-d892-43e3-9024-65d84dcffe56%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/481aacc3-d892-43e3-9024-65d84dcffe56%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDk5vd0kN6rNFmFwOOeTgxnrBGQo4d7GN-___Vkj%2BRUug%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
