Re: How to use ElasticSearch to implement Autocompleter ?

joa Fri, 17 Jan 2014 08:12:21 -0800

You should look at the the completion suggester added in 0.90.30 instead of 
using edgengrams.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
http://www.elasticsearch.org/blog/you-complete-me/



On Friday, January 17, 2014 5:04:14 PM UTC+1, coder wrote:
>
> Hi,
>
> I'm trying to use elasticsearch to implement a autocompleter  for my 
> college project just like some travel websites use it for implementing 
> their autocompleter but facing some issues in implementation.
>
> I'm using following mapping for my case:-
>
> curl -XPUT 
> 'http://localhost:9200/auto_index/<http://localhost:9200/acqindex/>' 
> -d '{
>      "settings" : {
>         "index" : {
>             "number_of_shards" : 1,
>             "number_of_replicas" : 1,
>             "analysis" : {
>                "analyzer" : {
>                   "str_search_analyzer" : {
>                       "tokenizer" : "standard",
>                       "filter" : ["lowercase","asciifolding","
> suggestion_shingle","edgengram"]
>                    },
>                    "str_index_analyzer" : {
>                      "tokenizer" : "standard",
>                      "filter" : 
> ["lowercase","asciifolding","suggestions_shingle","edgengram"]
>                   }
>                },
>                "filter" : {
>                    "suggestions_shingle": {
>                        "type": "shingle",
>                        "min_shingle_size": 2,
>                        "max_shingle_size": 5
>                   },
>                   "edgengram" : {
>                       "type" : "edgeNGram",
>                       "min_gram" : 2,
>                       "max_gram" : 30,
>                       "side"     : "front"
>                   },
>                   "mynGram" : {
>                         "type" : "nGram",
>                         "min_gram" : 2,
>                         "max_gram" : 30
>                   }
>               }
>           },
>           "similarity" : {
>                      "index": {
>                              "type": 
> "org.elasticsearch.index.similarity.CustomSimilarityProvider"
>                      },
>                      "search": {
>                              "type": 
> "org.elasticsearch.index.similarity.CustomSimilarityProvider"
>                      }
>           }    
>      }
>   }
>
> curl -XPUT 'localhost:9200/auto_index/autocomplete/_mapping' -d '{
>     "autocomplete":{
>        "_boost" : {
>             "name" : "po", 
>             "null_value" : 4.0
>        },
>        "properties": {
>                 "ad": {
>                     "type": "string",
>                     "search_analyzer" : "str_search_analyzer",
>                     "index_analyzer" : "str_index_analyzer",
>                     "omit_norms": "true",
>                     "similarity": "index"
>                 },
>                 "category": {
>                     "type": "string",
>                     "include_in_all" : false
>                 },
>                 "cn": {
>                     "type": "string",
>                     "search_analyzer" : "str_search_analyzer",
>                     "index_analyzer" : "str_index_analyzer",
>                     "omit_norms": "true",
>                     "similarity": "index"
>                 },
>                 "ctype": {
>                     "type": "string",
>                     "search_analyzer" : "keyword",
>                     "index_analyzer" : "keyword",
>                     "omit_norms": "true",
>                     "similarity": "index"
>                 },
>                 "eid": {
>                     "type": "string",
>                     "include_in_all" : false
>                 },
>                 "st": {
>                     "type": "string",
>                     "search_analyzer" : "str_search_analyzer",
>                     "index_analyzer" : "str_index_analyzer",
>                     "omit_norms": "true",
>                     "similarity": "index"
>                 },
>                 "co": {
>                     "type": "string",
>                     "include_in_all" : false
>                 },
>                 "st": {
>                     "type": "string",
>                     "search_analyzer" : "str_search_analyzer",
>                     "index_analyzer" : "str_index_analyzer",
>                     "omit_norms": "true",
>                     "similarity": "index"
>                 },
>                 "co": {
>                     "type": "string",
>                     "search_analyzer" : "str_search_analyzer",
>                     "index_analyzer" : "str_index_analyzer",
>                     "omit_norms": "true",
>                     "similarity": "index"
>                 },
>                 "po": {
>                     "type": "double",
>                     "boost": 4.0
>                 },
>                 "en":{
>                     "type": "boolean"
>                 },
>                 "_oid":{
>                     "type": "long"
>                 },
>                 "text": {
>                     "type": "string",
>                     "search_analyzer" : "str_search_analyzer",
>                     "index_analyzer" : "str_index_analyzer",
>                     "omit_norms": "true",
>                     "similarity": "index"
>                 },
>                 "url": {
>                     "type": "string"
>                 }               
>          }
>      }
> }'
>
> and then in my java code, i'm forming query like:-
>
> String script = "_score * (doc['po'].empty ? 1 : doc['po'].value == 0.0 ? 
> 1 : doc['po'].value)";
>         QueryBuilder queryBuilder = QueryBuilders.customScoreQuery(
>                                         QueryBuilders.queryString(query)
>                                             .field("text",30)
>                                              .field("ad")
>                                             .field("st")
>                                             .field("cn")
>                                             .field("co")
>                                             
> .defaultOperator(Operator.AND)).script(script);
>
>  Some explanation of fields:
> text: contains statements like "things to do in goa"
> ad: address
> st: state
> cn: city name
> co: country
>
> Now, if I type "things to do in" in  my autocompleter box, i'm getting 
> these results:
>
> things to do in rann
> things to do in bulandshahr
> things to do in gondai
> things to do in rewa
> things to do in goa
>
> But I want "things to do in goa" on top.
>
> Earlier, I thought idf in Elasticsearch is creating problem, So I override 
> the Default similarity and created CustomSimilarity which sets idf to 1. 
> But it's still not solving not my problem. Instead it started giving me 
> results like this:
>
> things to do in toronto on top.
>
> I think may be I'm doing something wrong in my index_analyzer and 
> search_analyzer. I tried other tokenizers and token filters in different 
> order but not able to get any solution.
>  
> I could have implemented simple prefix autocompleter but that way it 
> doesn't make any sense to use Elasticsearch since searching for terms in 
> between sentences gives user more flexibility. Also, in travel industry a 
> person can search for a particular thing in different manners. like instead 
> of searching for exactly "things to do in" he/she can also wrote "what are 
> the best things to do in" or "what are things to do" and many other 
> possibilities. That way a prefix autocompleter won't work effectively. 
> That's why I tried implementing autocompleter using ElasticSearch but I'm 
> not doing it right way.
>
> For better results, I also introduced a popularity factor which keeps 
> updating on every user click so that its score keeps increasing in every 
> search using custom score query. Also, giving text field 30% weightage and 
> lesser weightage to other fields. But something is not going right.
>
> I guess I'm not able to use ElasticSearch capabilities properly for my use 
> case. Can you please help me with this ?
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3fb42188-c58a-4ab0-bcb8-48c1b075eb71%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to use ElasticSearch to implement Autocompleter ?

Reply via email to