How to use ElasticSearch to implement Autocompleter ?

coder Fri, 17 Jan 2014 08:16:53 -0800

Hi,

I'm trying to use elasticsearch to implement a autocompleter  for my 
college project just like some travel websites use it for implementing 
their autocompleter but facing some issues in implementation.


I'm using following mapping for my case:-

curl -XPUT 'http://localhost:9200/auto_index/<http://localhost:9200/acqindex/>' 
-d '{
     "settings" : {
        "index" : {
            "number_of_shards" : 1,
            "number_of_replicas" : 1,
            "analysis" : {
               "analyzer" : {
                  "str_search_analyzer" : {
                      "tokenizer" : "standard",
                      "filter" : ["lowercase","asciifolding","
suggestion_shingle","edgengram"]
                   },
                   "str_index_analyzer" : {
                     "tokenizer" : "standard",
                     "filter" : 
["lowercase","asciifolding","suggestions_shingle","edgengram"]
                  }
               },
               "filter" : {
                   "suggestions_shingle": {
                       "type": "shingle",
                       "min_shingle_size": 2,
                       "max_shingle_size": 5
                  },
                  "edgengram" : {
                      "type" : "edgeNGram",
                      "min_gram" : 2,
                      "max_gram" : 30,
                      "side"     : "front"
                  },
                  "mynGram" : {
                        "type" : "nGram",
                        "min_gram" : 2,
                        "max_gram" : 30
                  }
              }
          },
          "similarity" : {
                     "index": {
                             "type": 
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
                     },
                     "search": {
                             "type": 
"org.elasticsearch.index.similarity.CustomSimilarityProvider"
                     }
          }    
     }
  }

curl -XPUT 'localhost:9200/auto_index/autocomplete/_mapping' -d '{
    "autocomplete":{
       "_boost" : {
            "name" : "po", 
            "null_value" : 4.0
       },
       "properties": {
                "ad": {
                    "type": "string",
                    "search_analyzer" : "str_search_analyzer",
                    "index_analyzer" : "str_index_analyzer",
                    "omit_norms": "true",
                    "similarity": "index"
                },
                "category": {
                    "type": "string",
                    "include_in_all" : false
                },
                "cn": {
                    "type": "string",
                    "search_analyzer" : "str_search_analyzer",
                    "index_analyzer" : "str_index_analyzer",
                    "omit_norms": "true",
                    "similarity": "index"
                },
                "ctype": {
                    "type": "string",
                    "search_analyzer" : "keyword",
                    "index_analyzer" : "keyword",
                    "omit_norms": "true",
                    "similarity": "index"
                },
                "eid": {
                    "type": "string",
                    "include_in_all" : false
                },
                "st": {
                    "type": "string",
                    "search_analyzer" : "str_search_analyzer",
                    "index_analyzer" : "str_index_analyzer",
                    "omit_norms": "true",
                    "similarity": "index"
                },
                "co": {
                    "type": "string",
                    "include_in_all" : false
                },
                "st": {
                    "type": "string",
                    "search_analyzer" : "str_search_analyzer",
                    "index_analyzer" : "str_index_analyzer",
                    "omit_norms": "true",
                    "similarity": "index"
                },
                "co": {
                    "type": "string",
                    "search_analyzer" : "str_search_analyzer",
                    "index_analyzer" : "str_index_analyzer",
                    "omit_norms": "true",
                    "similarity": "index"
                },
                "po": {
                    "type": "double",
                    "boost": 4.0
                },
                "en":{
                    "type": "boolean"
                },
                "_oid":{
                    "type": "long"
                },
                "text": {
                    "type": "string",
                    "search_analyzer" : "str_search_analyzer",
                    "index_analyzer" : "str_index_analyzer",
                    "omit_norms": "true",
                    "similarity": "index"
                },
                "url": {
                    "type": "string"
                }               
         }
     }
}'

and then in my java code, i'm forming query like:-

String script = "_score * (doc['po'].empty ? 1 : doc['po'].value == 0.0 ? 1 
: doc['po'].value)";
        QueryBuilder queryBuilder = QueryBuilders.customScoreQuery(
                                        QueryBuilders.queryString(query)
                                            .field("text",30)
                                             .field("ad")
                                            .field("st")
                                            .field("cn")
                                            .field("co")
                                            
.defaultOperator(Operator.AND)).script(script);

 Some explanation of fields:
text: contains statements like "things to do in goa"
ad: address
st: state
cn: city name
co: country

Now, if I type "things to do in" in  my autocompleter box, i'm getting 
these results:

things to do in rann
things to do in bulandshahr
things to do in gondai
things to do in rewa
things to do in goa

But I want "things to do in goa" on top.

Earlier, I thought idf in Elasticsearch is creating problem, So I override 
the Default similarity and created CustomSimilarity which sets idf to 1. 
But it's still not solving not my problem. Instead it started giving me 
results like this:

things to do in toronto on top.

I think may be I'm doing something wrong in my index_analyzer and 
search_analyzer. I tried other tokenizers and token filters in different 
order but not able to get any solution.
 
I could have implemented simple prefix autocompleter but that way it 
doesn't make any sense to use Elasticsearch since searching for terms in 
between sentences gives user more flexibility. Also, in travel industry a 
person can search for a particular thing in different manners. like instead 
of searching for exactly "things to do in" he/she can also wrote "what are 
the best things to do in" or "what are things to do" and many other 
possibilities. That way a prefix autocompleter won't work effectively. 
That's why I tried implementing autocompleter using ElasticSearch but I'm 
not doing it right way.

For better results, I also introduced a popularity factor which keeps 
updating on every user click so that its score keeps increasing in every 
search using custom score query. Also, giving text field 30% weightage and 
lesser weightage to other fields. But something is not going right.

I guess I'm not able to use ElasticSearch capabilities properly for my use 
case. Can you please help me with this ?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/39ce69bc-e2b8-4c27-9240-d6dbcc5a0656%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

How to use ElasticSearch to implement Autocompleter ?

Reply via email to