Ivan, thank you feeding my curiosity! The first one really gave me an "a-ha!" moment when I saw the images of the synonym matching as directed graphs. It put some insight as to why my multi-token synonyms were being expanded a certain way.
On Tuesday, July 22, 2014 4:37:45 PM UTC-5, Ivan Brusic wrote: > > I appreciate the fact that you want to know why you shouldn't use synonyms > at query time. I couldn't find the following articles during my last > response (I read them a while back and I have waaaaay too many bookmarks), > but I finally found them: > > > http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html > http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ > > -- > Ivan > > > On Tue, Jul 22, 2014 at 11:03 AM, Ivan Brusic <[email protected] > <javascript:>> wrote: > >> A couple of reasons. The biggest issue is multi word synonyms since the >> query parser will tokenize the query before analysis is applied. Also, >> scoring could be affected and the results can be screwy. Here is a better >> write up: >> >> >> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory >> >> -- >> Ivan >> >> >> On Tue, Jul 22, 2014 at 10:47 AM, Daniel Yim <[email protected] >> <javascript:>> wrote: >> >>> Thank you! That solved the initial issue. >>> >>> Could you expand on why I would need two analyzers? I did what you >>> asked, but I am unsure of the reason behind it and would like to learn. >>> >>> Here are my updated settings: >>> >>> curl -XPUT "http://localhost:9200/personsearch" -d' >>> { >>> "settings": { >>> "index": { >>> "analysis": { >>> "analyzer": { >>> "XYZSynAnalyzer": { >>> "tokenizer": "whitespace", >>> "filter": [ >>> "lowercase", >>> "XYZSynFilter" >>> ] >>> }, >>> "MyAnalyzer": { >>> "tokenizer": "standard", >>> "filter": [ >>> "standard", >>> "lowercase", >>> "stop" >>> ] >>> } >>> }, >>> "filter": { >>> "XYZSynFilter": { >>> "type": "synonym", >>> "synonyms": [ >>> "aids, retrovirology" >>> ] >>> } >>> } >>> } >>> } >>> }, >>> "mappings": { >>> "xyzemployee": { >>> "_all": { >>> "analyzer": "XYZSynAnalyzer" >>> }, >>> "properties": { >>> "firstName": { >>> "type": "string" >>> }, >>> "lastName": { >>> "type": "string" >>> }, >>> "middleName": { >>> "type": "string", >>> "include_in_all": false, >>> "index": "not_analyzed" >>> }, >>> "specialty": { >>> "type": "string", >>> "index_analyzer": "XYZSynAnalyzer", >>> "search_analyzer": "MyAnalyzer" >>> } >>> } >>> } >>> } >>> }' >>> >>> On Tuesday, July 22, 2014 11:56:40 AM UTC-5, Ivan Brusic wrote: >>> >>>> Your issue is casing. You are only applying the synonym filter, which >>>> by default does not lowercase terms. You can either set ignore_case to >>>> true >>>> for the synonym filter or apply a lower case filter before the synonym. I >>>> prefer to use the latter approach since I prefer to have all my analyzed >>>> tokens lowercased. >>>> >>>> Also, you should only apply the synonym filter at index time. You would >>>> need to create two similar analyzers, one with the synonym filter and one >>>> without. You can set the different ones via index_analyzer and >>>> search_analyzer. >>>> >>>> http://www.elasticsearch.org/guide/en/elasticsearch/ >>>> reference/current/mapping-core-types.html#string >>>> >>>> Cheers, >>>> >>>> Ivan >>>> >>>> >>>> On Tue, Jul 22, 2014 at 9:33 AM, Daniel Yim <[email protected]> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> I am relatively new to elasticsearch and am having issues with getting >>>>> my synonym filter to work. Can you take a look at the settings and tell >>>>> me >>>>> where I am going wrong? >>>>> >>>>> I am expecting the search for "aids" to match the search results if I >>>>> were to search for "retrovirology", but this is not happening. >>>>> >>>>> Thanks! >>>>> >>>>> curl -XDELETE "http://localhost:9200/personsearch" >>>>> >>>>> curl -XPUT "http://localhost:9200/personsearch" -d' >>>>> { >>>>> "settings": { >>>>> "index": { >>>>> "analysis": { >>>>> "analyzer": { >>>>> "XYZSynAnalyzer": { >>>>> "tokenizer": "standard", >>>>> "filter": [ >>>>> "XYZSynFilter" >>>>> ] >>>>> } >>>>> }, >>>>> "filter": { >>>>> "XYZSynFilter": { >>>>> "type": "synonym", >>>>> "synonyms": [ >>>>> "aids, retrovirology" >>>>> ] >>>>> } >>>>> } >>>>> } >>>>> } >>>>> }, >>>>> "mappings": { >>>>> "xyzemployee": { >>>>> "_all": { >>>>> "analyzer": "XYZSynAnalyzer" >>>>> }, >>>>> "properties": { >>>>> "firstName": { >>>>> "type": "string" >>>>> }, >>>>> "lastName": { >>>>> "type": "string" >>>>> }, >>>>> "middleName": { >>>>> "type": "string", >>>>> "include_in_all": false, >>>>> "index": "not_analyzed" >>>>> }, >>>>> "specialty": { >>>>> "type": "string", >>>>> "analyzer": "XYZSynAnalyzer" >>>>> } >>>>> } >>>>> } >>>>> } >>>>> }' >>>>> >>>>> curl -XPUT "http://localhost:9200/personsearch/xyzemployee/1" -d' >>>>> { >>>>> "firstName": "Don", >>>>> "middleName": "W.", >>>>> "lastName": "White", >>>>> "specialty": "Adult Retrovirology" >>>>> }' >>>>> >>>>> curl -XPUT "http://localhost:9200/personsearch/xyzemployee/2" -d' >>>>> { >>>>> "firstName": "Terrance", >>>>> "middleName": "G.", >>>>> "lastName": "Gartner", >>>>> "specialty": "Retrovirology" >>>>> }' >>>>> >>>>> curl -XPUT "http://localhost:9200/personsearch/xyzemployee/3" -d' >>>>> { >>>>> "firstName": "Carter", >>>>> "middleName": "L.", >>>>> "lastName": "Taylor", >>>>> "specialty": "Pediatric Retrovirology" >>>>> }' >>>>> >>>>> curl -XGET "http://localhost:9200/personsearch/xyzemployee/_ >>>>> search?pretty=true" -d' >>>>> { >>>>> "query": { >>>>> "match": { >>>>> "specialty": "retrovirology" >>>>> } >>>>> } >>>>> }' >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d% >>>>> 40googlegroups.com >>>>> <https://groups.google.com/d/msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/c860ecf3-e4ae-4aad-8a44-e41166f7995e%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/elasticsearch/c860ecf3-e4ae-4aad-8a44-e41166f7995e%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cafbbaf9-c39e-4f6d-ad6c-e367d54bf8fa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
