Hi All,

I have a requirement in which I need to find distinct company names. I was 
using "Keyword" tokenizer for that field and through term facet I was able 
to get distinct company names. However terms facet treated company names 
like "ibm suisse", "ibm corporation", "ibm" as different companies.
Online documentation suggested me to use "Synonym filter" to solve this. My 
settings is:

curl -XPUT 'http://localhost:9200/dataindex/' -d '{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
         "customAnalyzer": {
            "type": "custom",
            "tokenizer": "whitespace",
            "filter": [
              "lowercase","synonym"  
            ]
          }
        },
        "filter": {
          "synonym" : {
              "type" : "synonym",
              "tokenizer": "keyword",
              "synonyms_path" : "analysis/synonym.txt"
          }
        }
      }
    }
  }
}'

My mapping is:

curl -XPUT 'http://localhost:9200/dataindex/tweet/_mapping' -d '
{             
    "tweet" : {
        "properties" : {
            "company": {
                 "type": "string",
                 "analyzer": "customAnalyzer"
            }  
        }               
    }       
}'

In the synonym.txt file I have : ibm suisse, ibm corporation, ibm business, 
ibm => ibm corp ltd

Indexed data:
curl -XPUT 'http://localhost:9200/dataindex/tweet/1' -d '{
    "company" : "ibm"
}'
curl -XPUT 'http://localhost:9200/dataindex/tweet/2' -d '{
    "company" : "ibm corporation"
}'
curl -XPUT 'http://localhost:9200/dataindex/tweet/3' -d '{
    "company" : "ibm suisse"
}'
curl -XPUT 'http://localhost:9200/dataindex/tweet/4' -d '{
    "company" : "ibm business"
}'

If I run a terms facet:
{
  "facets": {
    "loc_facet": {
      "terms": {
        "field": "company"
      }
    }
  }
}
I get 3 terms ie {term: ibm corp ltd, count: 3} {term: suisse, count: 1} 
{term: corporation, count: 1}
I want the facet result to return only one term: ibm corp ltd with count=3. 
This way i will get distinct company names and also map synonym names into 
single company name.
Please correct me if I am using wrong tokenizer or my approach is not 
correct.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1ba32926-7015-4b8a-89ae-bf43a2561b71%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to