Despite my name, I do not speak Russian. :) Please excuse my ignorance of
the Russian language while I attempt to debug.

Currently, the synonym token filter is being applied after the other three
token filters: "snowball_text", "lowercase", and "russian_morphology". In
this case, the synonym mapping will be executing key lookups on terms that
have been stemmed and lowercase (I do not know what russian_morphology
provides). Try moving your synonym filter before any stemming. After
lowercasing is fine, as long as your synonym map have lowercased values (or
set ignore_case to true). In your example, foo/bar/baz have no further
stemming, so they work as is.

Cheers,

Ivan


On Thu, Mar 6, 2014 at 2:39 AM, Владимир Руденко <[email protected]>wrote:

> Hi.
> I have test index with settings:
> curl -XPOST 'http://localhost:9200/test_index' -d '
> {
>     "settings" : {
>           "number_of_shards" : 5,
>           "language":"javascript",
>           "analysis": {
>                      "filter": {
>                           "snowball_text" : {
>                                 "type": "snowball",
>                                 "language": "Russian"
>                             },
>                           "synonym" : {
>                                 "type" : "synonym",
>                                 "synonyms_path" : "synonym.txt"
>                           }
>                      },
>                      "analyzer": {
>                         "search" : {
>                             "type" :"custom",
>                             "tokenizer": "standard",
>                             "filter": ["snowball_text", "lowercase",
> "russian_morphology", "synonym"]
>                         }
>                 }
>           }
>     },
>     "mappings" : {
>         "test_type" : {
>             "properties" : {
>                 "test" : {
>                     "type" : "string",
>                     "analyzer" : "search"
>                 },
>                 "description" : {
>                     "type" : "string",
>                     "analyzer" : "search"
>                 }
>             }
>         }
>     }
> }'
>
> File synonym.txt:
> продажа => купить
> аренда => арендовать, сниму, снять
> foo => foo bar, baz
>
> English words works fine:
> curl -XGET '
> http://localhost:9200/test_index/_analyze?text=foo&analyzer=search&pretty=true
> '
> {
>   "tokens" : [ {
>     "token" : "foo",
>     "start_offset" : 0,
>     "end_offset" : 3,
>     "type" : "SYNONYM",
>     "position" : 1
>   }, {
>     "token" : "baz",
>     "start_offset" : 0,
>     "end_offset" : 3,
>     "type" : "SYNONYM",
>     "position" : 1
>   }, {
>     "token" : "bar",
>     "start_offset" : 0,
>     "end_offset" : 3,
>     "type" : "SYNONYM",
>     "position" : 2
>   } ]
> }
>
> But russian:
> curl -XGET '
> http://localhost:9200/test_index/_analyze?text=продажа&analyzer=search&pretty=true
> '
> {
>   "tokens" : [ {
>     "token" : "タ",
>     "start_offset" : 3,
>     "end_offset" : 4,
>     "type" : "<KATAKANA>",
>     "position" : 1
>   }, {
>     "token" : "ᄒ",
>     "start_offset" : 5,
>     "end_offset" : 6,
>     "type" : "<HANGUL>",
>     "position" : 2
>   }, {
>     "token" : "ᄡ",
>     "start_offset" : 7,
>     "end_offset" : 8,
>     "type" : "<HANGUL>",
>     "position" : 3
>   }, {
>     "token" : "ᄚ",
>     "start_offset" : 9,
>     "end_offset" : 10,
>     "type" : "<HANGUL>",
>     "position" : 4
>   }, {
>     "token" : "ᄊ",
>     "start_offset" : 11,
>     "end_offset" : 12,
>     "type" : "<HANGUL>",
>     "position" : 5
>   }, {
>     "token" : "ᄚ",
>     "start_offset" : 13,
>     "end_offset" : 14,
>     "type" : "<HANGUL>",
>     "position" : 6
>   } ]
> }
>
> I cant't understand what i'm doing wrong?
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/481aacc3-d892-43e3-9024-65d84dcffe56%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/481aacc3-d892-43e3-9024-65d84dcffe56%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDk5vd0kN6rNFmFwOOeTgxnrBGQo4d7GN-___Vkj%2BRUug%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to