Hi All, I have a requirement in which I need to find distinct company names. I was using "Keyword" tokenizer for that field and through term facet I was able to get distinct company names. However terms facet treated company names like "ibm suisse", "ibm corporation", "ibm" as different companies. Online documentation suggested me to use "Synonym filter" to solve this. My settings is:
curl -XPUT 'http://localhost:9200/dataindex/' -d '{ "settings": { "index": { "analysis": { "analyzer": { "customAnalyzer": { "type": "custom", "tokenizer": "whitespace", "filter": [ "lowercase","synonym" ] } }, "filter": { "synonym" : { "type" : "synonym", "tokenizer": "keyword", "synonyms_path" : "analysis/synonym.txt" } } } } } }' My mapping is: curl -XPUT 'http://localhost:9200/dataindex/tweet/_mapping' -d ' { "tweet" : { "properties" : { "company": { "type": "string", "analyzer": "customAnalyzer" } } } }' In the synonym.txt file I have : ibm suisse, ibm corporation, ibm business, ibm => ibm corp ltd Indexed data: curl -XPUT 'http://localhost:9200/dataindex/tweet/1' -d '{ "company" : "ibm" }' curl -XPUT 'http://localhost:9200/dataindex/tweet/2' -d '{ "company" : "ibm corporation" }' curl -XPUT 'http://localhost:9200/dataindex/tweet/3' -d '{ "company" : "ibm suisse" }' curl -XPUT 'http://localhost:9200/dataindex/tweet/4' -d '{ "company" : "ibm business" }' If I run a terms facet: { "facets": { "loc_facet": { "terms": { "field": "company" } } } } I get 3 terms ie {term: ibm corp ltd, count: 3} {term: suisse, count: 1} {term: corporation, count: 1} I want the facet result to return only one term: ibm corp ltd with count=3. This way i will get distinct company names and also map synonym names into single company name. Please correct me if I am using wrong tokenizer or my approach is not correct. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1ba32926-7015-4b8a-89ae-bf43a2561b71%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
