[
https://issues.apache.org/jira/browse/LUCENE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593740#comment-13593740
]
Arcadius Ahouansou commented on LUCENE-4499:
--------------------------------------------
Hello.
We are currently being hit by this multi-word synonym bug.
I have noticed that this issue and SOLR-4381 have a very low priority.
Is it possible we raise the priority so that we can get this in?
This is a major issue in my opinion.
Thanks.
> Multi-word synonym filter (synonym expansion)
> ---------------------------------------------
>
> Key: LUCENE-4499
> URL: https://issues.apache.org/jira/browse/LUCENE-4499
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/other
> Affects Versions: 4.1, 5.0
> Reporter: Roman Chyla
> Priority: Minor
> Labels: analysis, multi-word, synonyms
> Fix For: 5.0
>
> Attachments: LUCENE-4499.patch, LUCENE-4499.patch
>
>
> I apologize for bringing the multi-token synonym expansion up again. There is
> an old, unresolved issue at LUCENE-1622 [1]
> While solving the problem for our needs [2], I discovered that the current
> SolrSynonym parser (and the wonderful FTS) have almost everything to
> satisfactorily handle both the query and index time synonym expansion. It
> seems that people often need to use the synonym filter *slightly* differently
> at indexing and query time.
> In our case, we must do different things during indexing and querying.
> Example sentence: Mirrors of the Hubble space telescope pointed at XA5
> This is what we need (comma marks position bump):
> indexing: mirrors,hubble|hubble space
> telescope|hst,space,telescope,pointed,xa5|astroobject#5
> querying: +mirrors +(hubble space telescope | hst) +pointed
> +(xa5|astroboject#5)
> This translated to following needs:
> indexing time:
> single-token synonyms => return only synonyms
> multi-token synonyms => return original tokens *AND* the synonyms
> query time:
> single-token: return only synonyms (but preserve case)
> multi-token: return only synonyms
>
> We need the original tokens for the proximity queries, if we indexed 'hubble
> space telescope'
> as one token, we cannot search for 'hubble NEAR telescope'
> You may (not) be surprised, but Lucene already supports ALL of these
> requirements. The patch is an attempt to state the problem differently. I am
> not sure if it is the best option, however it works perfectly for our needs
> and it seems it could work for general public too. Especially if the
> SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and
> people would just choose what situation they use. Please look at the unittest.
> links:
> [1] https://issues.apache.org/jira/browse/LUCENE-1622
> [2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158
> [3] seems to have similar request:
> http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]