[ 
https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507440#comment-13507440
 ] 

Roman Chyla commented on LUCENE-4499:
-------------------------------------

Hi Nolan, your case seems to confirm a need for some solution. You have decided 
to make a seaprate query parser, I have put the expanding logic into a query 
parser as well.

See this for the working example:
https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/apache/solr/analysis/TestAdsabsTypeFulltextParsing.java

And its config
https://github.com/romanchyla/montysolr/blob/master/contrib/examples/adsabs/solr/collection1/conf/schema.xml#L325

I see two added benefits (besides not needing a query parser plugin - in our 
case, it must be plugged into our qparser):

 1. you can use the filter at index/query time inside a standard query parser
 2. special configuration for synonym expansion (for example, we have found it 
very useful to be able to search for multi-tokens in case-insensitive manner, 
but recognize single tokens only case-sensitively; or expand with multi-token 
synonyms only for multi-word originals and output also the original words, 
otherwise eat them (replace them))

Nice blog post, I wish I could write as instructively as well :)
                
> Multi-word synonym filter (synonym expansion)
> ---------------------------------------------
>
>                 Key: LUCENE-4499
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4499
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/other
>    Affects Versions: 4.1, 5.0
>            Reporter: Roman Chyla
>            Priority: Minor
>              Labels: analysis, multi-word, synonyms
>             Fix For: 5.0
>
>         Attachments: LUCENE-4499.patch
>
>
> I apologize for bringing the multi-token synonym expansion up again. There is 
> an old, unresolved issue at LUCENE-1622 [1]
> While solving the problem for our needs [2], I discovered that the current 
> SolrSynonym parser (and the wonderful FTS) have almost everything to 
> satisfactorily handle both the query and index time synonym expansion. It 
> seems that people often need to use the synonym filter *slightly* differently 
> at indexing and query time.
> In our case, we must do different things during indexing and querying.
> Example sentence: Mirrors of the Hubble space telescope pointed at XA5
> This is what we need (comma marks position bump):
> indexing: mirrors,hubble|hubble space 
> telescope|hst,space,telescope,pointed,xa5|astroobject#5
> querying: +mirrors +(hubble space telescope | hst) +pointed 
> +(xa5|astroboject#5)
> This translated to following needs:
>   indexing time: 
>     single-token synonyms => return only synonyms
>     multi-token synonyms => return original tokens *AND* the synonyms
>   query time:
>     single-token: return only synonyms (but preserve case)
>     multi-token: return only synonyms
>  
> We need the original tokens for the proximity queries, if we indexed 'hubble 
> space telescope'
> as one token, we cannot search for 'hubble NEAR telescope'
> You may (not) be surprised, but Lucene already supports ALL of these 
> requirements. The patch is an attempt to state the problem differently. I am 
> not sure if it is the best option, however it works perfectly for our needs 
> and it seems it could work for general public too. Especially if the 
> SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and 
> people would just choose what situation they use. Please look at the unittest.
> links:
> [1] https://issues.apache.org/jira/browse/LUCENE-1622
> [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158
> [3] seems to have similar request: 
> http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to