Hi.
I have test index with settings:
curl -XPOST 'http://localhost:9200/test_index' -d '
{
    "settings" : {
          "number_of_shards" : 5,
          "language":"javascript",
          "analysis": {
                     "filter": {
                          "snowball_text" : {
                                "type": "snowball",
                                "language": "Russian"
                            },
                          "synonym" : {
                                "type" : "synonym",
                                "synonyms_path" : "synonym.txt"
                          }
                     },
                     "analyzer": {
                        "search" : {
                            "type" :"custom",
                            "tokenizer": "standard",
                            "filter": ["snowball_text", "lowercase", 
"russian_morphology", "synonym"]
                        }
                }
          }
    },
    "mappings" : {
        "test_type" : {
            "properties" : {
                "test" : {
                    "type" : "string",
                    "analyzer" : "search"
                },
                "description" : {
                    "type" : "string",
                    "analyzer" : "search"
                }
            }
        }
    }
}'

File synonym.txt:
продажа => купить
аренда => арендовать, сниму, снять
foo => foo bar, baz

English words works fine:
curl -XGET 
'http://localhost:9200/test_index/_analyze?text=foo&analyzer=search&pretty=true'
{
  "tokens" : [ {
    "token" : "foo",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "SYNONYM",
    "position" : 1
  }, {
    "token" : "baz",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "SYNONYM",
    "position" : 1
  }, {
    "token" : "bar",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "SYNONYM",
    "position" : 2
  } ]
}

But russian:
curl -XGET 
'http://localhost:9200/test_index/_analyze?text=продажа&analyzer=search&pretty=true'
{
  "tokens" : [ {
    "token" : "タ",
    "start_offset" : 3,
    "end_offset" : 4,
    "type" : "<KATAKANA>",
    "position" : 1
  }, {
    "token" : "ᄒ",
    "start_offset" : 5,
    "end_offset" : 6,
    "type" : "<HANGUL>",
    "position" : 2
  }, {
    "token" : "ᄡ",
    "start_offset" : 7,
    "end_offset" : 8,
    "type" : "<HANGUL>",
    "position" : 3
  }, {
    "token" : "ᄚ",
    "start_offset" : 9,
    "end_offset" : 10,
    "type" : "<HANGUL>",
    "position" : 4
  }, {
    "token" : "ᄊ",
    "start_offset" : 11,
    "end_offset" : 12,
    "type" : "<HANGUL>",
    "position" : 5
  }, {
    "token" : "ᄚ",
    "start_offset" : 13,
    "end_offset" : 14,
    "type" : "<HANGUL>",
    "position" : 6
  } ]
}

I cant't understand what i'm doing wrong?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/481aacc3-d892-43e3-9024-65d84dcffe56%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to