I am trying to get an auto-suggest to work with a large list of inverted 
names using the most recent auto-suggest implementation (the one that uses 
Lucene FST):

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

I have gotten it to work in general but it seems to have trouble with 
inverted names such as "Gettyburg, battle of". I get hits for "G", "Get"... 
but if I start typing "battle of..." I get no results. Below is the 
auto-suggest mapping that I am using as well as a document sample that I am 
indexing:

*Mapping:*

{
    "settings": {
        "number_of_shards": 1
    },
    "mappings": {
        "person": {
            "_source": {
                "enabled": true
            },
                "name": {
                    "type": "string",
                    "index_name": "name",
                    "index": "analyzed"
                },
                "suggest" : { "type" : "completion",
                          "index_analyzer" : "standard",
                          "search_analyzer" : "standard",
                          "preserve_position_increments": false,
                          "preserve_separators": false,
                          "payloads" : true,
                          "context": {
                             "type":{ "type": "category", "path": "_type"}}
               }
            }
        }
    }
  }

*Indexed Document:*

{
    "id": "1",
    "name": "Gettysburg, battle of",
    "suggest": {
        "input": [
            "Gettysburg, battle of"
        ],
        "output": "Gettysburg, battle of",
        "payload": {
            "id": "1"
        }
    }
}

I think the problem has to do with the index and search analyzer but I have 
tried a variety of types/combinations and can not get it to work. The only 
option that I have gotten to work is to use a script to pre-process the 
data and look for inverted names. Basically it looks for strings like 
"Gettysburg, battle of" tests to see if there is a "," and if so, split on 
the comma and print out both the first part and second part as additional 
"input" values (In this case "Gettysburg" and "battle of" would be input 
string in addition to "Gettysburg, battle of"). This seemed very wasteful 
and unnecessary.

Any help or suggestions would be appreciated.

Thanks,

Jeff

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7758a374-4450-4f9a-8d33-e957ace02af1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to