Re: How to use ElasticSearch to implement Autocompleter ?

joa Fri, 17 Jan 2014 08:37:59 -0800

You can index a term in mutliple ways with the suggestion completer. See (
http://www.elasticsearch.org/blog/you-complete-me/<http://www.google.com/url?q=http%3A%2F%2Fwww.elasticsearch.org%2Fblog%2Fyou-complete-me%2F&sa=D&sntz=1&usg=AFQjCNE7l1bQE4K3E-uZpWW1Las-1VRrQA>),
 
they are showing hotel bookings as use case!
curl -X PUT localhost:9200/hotels/hotel/1 -d '
{
  "name" :         "Mercure Hotel Munich",
  "city" :         "Munich",
  "name_suggest" : {
    "input" :      [
      "Mercure Hotel Munich",
      "Mercure Munich",
      "ADD OTHER WORD COMBINATIONS HERE..." 
    ]
  }
}'


If you mean by exact matches you also want fuzzy suggests (e.g. suggest 
even with misspelling) you can set the the fuzzy param:

curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
    "song-suggest" : {
        "text" : "n",
        "completion" : {
            "field" : "suggest",
            "fuzzy" : {
                "edit_distance" : 2
            }
        }
    }
}'






On Friday, January 17, 2014 5:20:36 PM UTC+1, coder wrote:
>
> But the problem still remains. The completion suggester will give you 
> results only if there is an exact match but as previously mentioned there 
> can be many types of queries which can be done by a user at travel website.
>
> Thanks
>
>
> On Fri, Jan 17, 2014 at 9:41 PM, joa <[email protected] <javascript:>>wrote:
>
>> You should look at the the completion suggester added in 0.90.30 instead 
>> of using edgengrams.
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
>> http://www.elasticsearch.org/blog/you-complete-me/
>>
>>
>> On Friday, January 17, 2014 5:04:14 PM UTC+1, coder wrote:
>>>
>>> Hi,
>>>
>>> I'm trying to use elasticsearch to implement a autocompleter  for my 
>>> college project just like some travel websites use it for implementing 
>>> their autocompleter but facing some issues in implementation.
>>>
>>> I'm using following mapping for my case:-
>>>
>>> curl -XPUT 
>>> 'http://localhost:9200/auto_index/<http://localhost:9200/acqindex/>' 
>>> -d '{
>>>      "settings" : {
>>>         "index" : {
>>>             "number_of_shards" : 1,
>>>             "number_of_replicas" : 1,
>>>             "analysis" : {
>>>                "analyzer" : {
>>>                   "str_search_analyzer" : {
>>>                       "tokenizer" : "standard",
>>>                       "filter" : ["lowercase","asciifolding","
>>> suggestion_shingle","edgengram"]
>>>                    },
>>>                    "str_index_analyzer" : {
>>>                      "tokenizer" : "standard",
>>>                      "filter" : ["lowercase","asciifolding","
>>> suggestions_shingle","edgengram"]
>>>                   }
>>>                },
>>>                "filter" : {
>>>                    "suggestions_shingle": {
>>>                        "type": "shingle",
>>>                        "min_shingle_size": 2,
>>>                        "max_shingle_size": 5
>>>                   },
>>>                   "edgengram" : {
>>>                       "type" : "edgeNGram",
>>>                       "min_gram" : 2,
>>>                       "max_gram" : 30,
>>>                       "side"     : "front"
>>>                   },
>>>                   "mynGram" : {
>>>                         "type" : "nGram",
>>>                         "min_gram" : 2,
>>>                         "max_gram" : 30
>>>                   }
>>>               }
>>>           },
>>>           "similarity" : {
>>>                      "index": {
>>>                              "type": "org.elasticsearch.index.
>>> similarity.CustomSimilarityProvider"
>>>                      },
>>>                      "search": {
>>>                              "type": "org.elasticsearch.index.
>>> similarity.CustomSimilarityProvider"
>>>                      }
>>>           }    
>>>      }
>>>   }
>>>
>>> curl -XPUT 'localhost:9200/auto_index/autocomplete/_mapping' -d '{
>>>     "autocomplete":{
>>>        "_boost" : {
>>>             "name" : "po", 
>>>             "null_value" : 4.0
>>>        },
>>>        "properties": {
>>>                 "ad": {
>>>                     "type": "string",
>>>                     "search_analyzer" : "str_search_analyzer",
>>>                     "index_analyzer" : "str_index_analyzer",
>>>                     "omit_norms": "true",
>>>                     "similarity": "index"
>>>                 },
>>>                 "category": {
>>>                     "type": "string",
>>>                     "include_in_all" : false
>>>                 },
>>>                 "cn": {
>>>                     "type": "string",
>>>                     "search_analyzer" : "str_search_analyzer",
>>>                     "index_analyzer" : "str_index_analyzer",
>>>                     "omit_norms": "true",
>>>                     "similarity": "index"
>>>                 },
>>>                 "ctype": {
>>>                     "type": "string",
>>>                     "search_analyzer" : "keyword",
>>>                     "index_analyzer" : "keyword",
>>>                     "omit_norms": "true",
>>>                     "similarity": "index"
>>>                 },
>>>                 "eid": {
>>>                     "type": "string",
>>>                     "include_in_all" : false
>>>                 },
>>>                 "st": {
>>>                     "type": "string",
>>>                     "search_analyzer" : "str_search_analyzer",
>>>                     "index_analyzer" : "str_index_analyzer",
>>>                     "omit_norms": "true",
>>>                     "similarity": "index"
>>>                 },
>>>                 "co": {
>>>                     "type": "string",
>>>                     "include_in_all" : false
>>>                 },
>>>                 "st": {
>>>                     "type": "string",
>>>                     "search_analyzer" : "str_search_analyzer",
>>>                     "index_analyzer" : "str_index_analyzer",
>>>                     "omit_norms": "true",
>>>                     "similarity": "index"
>>>                 },
>>>                 "co": {
>>>                     "type": "string",
>>>                     "search_analyzer" : "str_search_analyzer",
>>>                     "index_analyzer" : "str_index_analyzer",
>>>                     "omit_norms": "true",
>>>                     "similarity": "index"
>>>                 },
>>>                 "po": {
>>>                     "type": "double",
>>>                     "boost": 4.0
>>>                 },
>>>                 "en":{
>>>                     "type": "boolean"
>>>                 },
>>>                 "_oid":{
>>>                     "type": "long"
>>>                 },
>>>                 "text": {
>>>                     "type": "string",
>>>                     "search_analyzer" : "str_search_analyzer",
>>>                     "index_analyzer" : "str_index_analyzer",
>>>                     "omit_norms": "true",
>>>                     "similarity": "index"
>>>                 },
>>>                 "url": {
>>>                     "type": "string"
>>>                 }               
>>>          }
>>>      }
>>> }'
>>>
>>> and then in my java code, i'm forming query like:-
>>>
>>> String script = "_score * (doc['po'].empty ? 1 : doc['po'].value == 0.0 
>>> ? 1 : doc['po'].value)";
>>>         QueryBuilder queryBuilder = QueryBuilders.customScoreQuery(
>>>                                         QueryBuilders.queryString(query)
>>>                                             .field("text",30)
>>>                                              .field("ad")
>>>                                             .field("st")
>>>                                             .field("cn")
>>>                                             .field("co")
>>>                                             
>>> .defaultOperator(Operator.AND)).script(script);
>>>
>>>  Some explanation of fields:
>>> text: contains statements like "things to do in goa"
>>> ad: address
>>> st: state
>>> cn: city name
>>> co: country
>>>
>>> Now, if I type "things to do in" in  my autocompleter box, i'm getting 
>>> these results:
>>>
>>> things to do in rann
>>> things to do in bulandshahr
>>> things to do in gondai
>>> things to do in rewa
>>> things to do in goa
>>>
>>> But I want "things to do in goa" on top.
>>>
>>> Earlier, I thought idf in Elasticsearch is creating problem, So I 
>>> override the Default similarity and created CustomSimilarity which sets idf 
>>> to 1. But it's still not solving not my problem. Instead it started giving 
>>> me results like this:
>>>
>>> things to do in toronto on top.
>>>
>>> I think may be I'm doing something wrong in my index_analyzer and 
>>> search_analyzer. I tried other tokenizers and token filters in different 
>>> order but not able to get any solution.
>>>  
>>> I could have implemented simple prefix autocompleter but that way it 
>>> doesn't make any sense to use Elasticsearch since searching for terms in 
>>> between sentences gives user more flexibility. Also, in travel industry a 
>>> person can search for a particular thing in different manners. like instead 
>>> of searching for exactly "things to do in" he/she can also wrote "what are 
>>> the best things to do in" or "what are things to do" and many other 
>>> possibilities. That way a prefix autocompleter won't work effectively. 
>>> That's why I tried implementing autocompleter using ElasticSearch but I'm 
>>> not doing it right way.
>>>
>>> For better results, I also introduced a popularity factor which keeps 
>>> updating on every user click so that its score keeps increasing in every 
>>> search using custom score query. Also, giving text field 30% weightage and 
>>> lesser weightage to other fields. But something is not going right.
>>>
>>> I guess I'm not able to use ElasticSearch capabilities properly for my 
>>> use case. Can you please help me with this ?
>>>
>>> Thanks
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/3fb42188-c58a-4ab0-bcb8-48c1b075eb71%40googlegroups.com
>> .
>>
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d060acb6-eb00-4a35-b707-7d626844f220%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to use ElasticSearch to implement Autocompleter ?

Reply via email to