[ 
https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16870696#comment-16870696
 ] 

Ruslan Dautkhanov commented on LUCENE-4499:
-------------------------------------------

Nolan Lawson’s blog also described a way to put different *weight* for synonyms 
.. is there is a way to do this using this in-Solr / Lucene functionality? 

Thank you.

> Multi-word synonym filter (synonym expansion)
> ---------------------------------------------
>
>                 Key: LUCENE-4499
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4499
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/other
>    Affects Versions: 4.1, 6.0
>            Reporter: Roman Chyla
>            Priority: Major
>              Labels: analysis, multi-word, synonyms
>             Fix For: 6.0
>
>         Attachments: LUCENE-4499.patch, LUCENE-4499.patch
>
>
> I apologize for bringing the multi-token synonym expansion up again. There is 
> an old, unresolved issue at LUCENE-1622 [1]
> While solving the problem for our needs [2], I discovered that the current 
> SolrSynonym parser (and the wonderful FTS) have almost everything to 
> satisfactorily handle both the query and index time synonym expansion. It 
> seems that people often need to use the synonym filter *slightly* differently 
> at indexing and query time.
> In our case, we must do different things during indexing and querying.
> Example sentence: Mirrors of the Hubble space telescope pointed at XA5
> This is what we need (comma marks position bump):
> indexing: mirrors,hubble|hubble space 
> telescope|hst,space,telescope,pointed,xa5|astroobject#5
> querying: +mirrors +(hubble space telescope | hst) +pointed 
> +(xa5|astroboject#5)
> This translated to following needs:
>   indexing time: 
>     single-token synonyms => return only synonyms
>     multi-token synonyms => return original tokens *AND* the synonyms
>   query time:
>     single-token: return only synonyms (but preserve case)
>     multi-token: return only synonyms
>  
> We need the original tokens for the proximity queries, if we indexed 'hubble 
> space telescope'
> as one token, we cannot search for 'hubble NEAR telescope'
> You may (not) be surprised, but Lucene already supports ALL of these 
> requirements. The patch is an attempt to state the problem differently. I am 
> not sure if it is the best option, however it works perfectly for our needs 
> and it seems it could work for general public too. Especially if the 
> SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and 
> people would just choose what situation they use. Please look at the unittest.
> links:
> [1] https://issues.apache.org/jira/browse/LUCENE-1622
> [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158
> [3] seems to have similar request: 
> http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to