Re: Help with Synonyms

Daniel Yim Wed, 23 Jul 2014 10:21:42 -0700

Ivan, thank you feeding my curiosity! The first one really gave me an 
"a-ha!" moment when I saw the images of the synonym matching as directed 
graphs. It put some insight as to why my multi-token synonyms were being 
expanded a certain way.


On Tuesday, July 22, 2014 4:37:45 PM UTC-5, Ivan Brusic wrote:
>
> I appreciate the fact that you want to know why you shouldn't use synonyms 
> at query time. I couldn't find the following articles during my last 
> response (I read them a while back and I have waaaaay too many bookmarks), 
> but I finally found them:
>
>
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
>
> -- 
> Ivan
>
>
> On Tue, Jul 22, 2014 at 11:03 AM, Ivan Brusic <[email protected] 
> <javascript:>> wrote:
>
>> A couple of reasons. The biggest issue is multi word synonyms since the 
>> query parser will tokenize the query before analysis is applied. Also, 
>> scoring could be affected and the results can be screwy. Here is a better 
>> write up:
>>
>>
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>
>> -- 
>> Ivan
>>
>>
>> On Tue, Jul 22, 2014 at 10:47 AM, Daniel Yim <[email protected] 
>> <javascript:>> wrote:
>>
>>> Thank you! That solved the initial issue.
>>>
>>> Could you expand on why I would need two analyzers? I did what you 
>>> asked, but I am unsure of the reason behind it and would like to learn.
>>>
>>> Here are my updated settings:
>>>
>>> curl -XPUT "http://localhost:9200/personsearch"; -d'
>>> {
>>>   "settings": {
>>>     "index": {
>>>       "analysis": {
>>>         "analyzer": {
>>>           "XYZSynAnalyzer": {
>>>             "tokenizer": "whitespace",
>>>             "filter": [
>>>               "lowercase",
>>>               "XYZSynFilter"
>>>             ]
>>>           },
>>>           "MyAnalyzer": {
>>>             "tokenizer": "standard",
>>>             "filter": [
>>>               "standard",
>>>               "lowercase",
>>>               "stop"
>>>             ]
>>>           }
>>>         },
>>>         "filter": {
>>>           "XYZSynFilter": {
>>>             "type": "synonym",
>>>             "synonyms": [
>>>               "aids, retrovirology"
>>>             ]
>>>           }
>>>         }
>>>       }
>>>     }
>>>   },
>>>   "mappings": {
>>>     "xyzemployee": {
>>>       "_all": {
>>>         "analyzer": "XYZSynAnalyzer"
>>>       },
>>>       "properties": {
>>>         "firstName": {
>>>           "type": "string"
>>>         },
>>>         "lastName": {
>>>           "type": "string"
>>>         },
>>>         "middleName": {
>>>           "type": "string",
>>>           "include_in_all": false,
>>>           "index": "not_analyzed"
>>>         },
>>>         "specialty": {
>>>           "type": "string",
>>>           "index_analyzer": "XYZSynAnalyzer",
>>>           "search_analyzer": "MyAnalyzer"
>>>         }
>>>       }
>>>     }
>>>   }
>>> }'
>>>
>>> On Tuesday, July 22, 2014 11:56:40 AM UTC-5, Ivan Brusic wrote:
>>>
>>>> Your issue is casing. You are only applying the synonym filter, which 
>>>> by default does not lowercase terms. You can either set ignore_case to 
>>>> true 
>>>> for the synonym filter or apply a lower case filter before the synonym. I 
>>>> prefer to use the latter approach since I prefer to have all my analyzed 
>>>> tokens lowercased.
>>>>
>>>> Also, you should only apply the synonym filter at index time. You would 
>>>> need to create two similar analyzers, one with the synonym filter and one 
>>>> without. You can set the different ones via index_analyzer and 
>>>> search_analyzer.
>>>>
>>>> http://www.elasticsearch.org/guide/en/elasticsearch/
>>>> reference/current/mapping-core-types.html#string
>>>>
>>>> Cheers,
>>>>
>>>> Ivan
>>>>
>>>>
>>>> On Tue, Jul 22, 2014 at 9:33 AM, Daniel Yim <[email protected]> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I am relatively new to elasticsearch and am having issues with getting 
>>>>> my synonym filter to work. Can you take a look at the settings and tell 
>>>>> me 
>>>>> where I am going wrong?
>>>>>
>>>>> I am expecting the search for "aids" to match the search results if I 
>>>>> were to search for "retrovirology", but this is not happening.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>  curl -XDELETE "http://localhost:9200/personsearch";
>>>>>
>>>>> curl -XPUT "http://localhost:9200/personsearch"; -d'
>>>>> {
>>>>>   "settings": {
>>>>>     "index": {
>>>>>       "analysis": {
>>>>>         "analyzer": {
>>>>>           "XYZSynAnalyzer": {
>>>>>             "tokenizer": "standard",
>>>>>             "filter": [
>>>>>               "XYZSynFilter"
>>>>>             ]
>>>>>           }
>>>>>         },
>>>>>         "filter": {
>>>>>           "XYZSynFilter": {
>>>>>             "type": "synonym",
>>>>>             "synonyms": [
>>>>>               "aids, retrovirology"
>>>>>             ]
>>>>>           }
>>>>>         }
>>>>>       }
>>>>>     }
>>>>>   },
>>>>>   "mappings": {
>>>>>     "xyzemployee": {
>>>>>       "_all": {
>>>>>         "analyzer": "XYZSynAnalyzer"
>>>>>       },
>>>>>       "properties": {
>>>>>         "firstName": {
>>>>>           "type": "string"
>>>>>         },
>>>>>         "lastName": {
>>>>>           "type": "string"
>>>>>         },
>>>>>         "middleName": {
>>>>>           "type": "string",
>>>>>           "include_in_all": false,
>>>>>           "index": "not_analyzed"
>>>>>         },
>>>>>         "specialty": {
>>>>>           "type": "string",
>>>>>           "analyzer": "XYZSynAnalyzer"
>>>>>         }
>>>>>       }
>>>>>     }
>>>>>   }
>>>>> }'
>>>>>
>>>>> curl -XPUT "http://localhost:9200/personsearch/xyzemployee/1"; -d'
>>>>> {
>>>>>   "firstName": "Don",
>>>>>   "middleName": "W.",
>>>>>   "lastName": "White",
>>>>>   "specialty": "Adult Retrovirology"
>>>>> }'
>>>>>
>>>>> curl -XPUT "http://localhost:9200/personsearch/xyzemployee/2"; -d'
>>>>> {
>>>>>   "firstName": "Terrance",
>>>>>   "middleName": "G.",
>>>>>   "lastName": "Gartner",
>>>>>   "specialty": "Retrovirology"
>>>>> }'
>>>>>
>>>>> curl -XPUT "http://localhost:9200/personsearch/xyzemployee/3"; -d'
>>>>> {
>>>>>   "firstName": "Carter",
>>>>>   "middleName": "L.",
>>>>>   "lastName": "Taylor",
>>>>>   "specialty": "Pediatric Retrovirology"
>>>>> }'
>>>>>
>>>>> curl -XGET "http://localhost:9200/personsearch/xyzemployee/_
>>>>> search?pretty=true" -d'
>>>>> {
>>>>>   "query": {
>>>>>     "match": {
>>>>>       "specialty": "retrovirology"
>>>>>     }
>>>>>   }
>>>>> }'
>>>>>
>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>>
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%
>>>>> 40googlegroups.com 
>>>>> <https://groups.google.com/d/msgid/elasticsearch/2e227a33-d935-4d22-89fb-57b59358c89d%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/c860ecf3-e4ae-4aad-8a44-e41166f7995e%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/c860ecf3-e4ae-4aad-8a44-e41166f7995e%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cafbbaf9-c39e-4f6d-ad6c-e367d54bf8fa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Help with Synonyms

Reply via email to