[jira] [Commented] (LUCENE-6664) Replace SynonymFilter with SynonymGraphFilter

Jack Krupansky (JIRA) Sun, 04 Oct 2015 08:31:59 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942686#comment-14942686
 ]


Jack Krupansky commented on LUCENE-6664:
----------------------------------------

Hey [~mikemccand], don't get discouraged, this was a very valuable exercise. I 
am a solid proponent of getting multi-term synonyms working in a full and 
robust manner, but I recognize that they just don't fit in cleanly with the 
existing flat token stream architecture. That's life. In any case, don't give 
up on this long-term effort.

Maybe the best thing for now is to retain the traditional flat synonym filter 
for compatibility, fully add the new SynonymGraphFilter, and then add the 
optional ability to enable graph support in the main Lucene query parser. 
(Alas, Solr, has its own fork of the Lucene query parser.) Support within 
phrase queries is the tricky part.

It would also be good to address the issue with non-phrase terms being analyzed 
separately - the query parser should recognize adjacent terms without operators 
are analyze as a group so that multi-token synonyms can be recognized.

> Replace SynonymFilter with SynonymGraphFilter
> ---------------------------------------------
>
>                 Key: LUCENE-6664
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6664
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-6664.patch, LUCENE-6664.patch, LUCENE-6664.patch, 
> LUCENE-6664.patch, usa.png, usa_flat.png
>
>
> Spinoff from LUCENE-6582.
> I created a new SynonymGraphFilter (to replace the current buggy
> SynonymFilter), that produces correct graphs (does no "graph
> flattening" itself).  I think this makes it simpler.
> This means you must add the FlattenGraphFilter yourself, if you are
> applying synonyms during indexing.
> Index-time syn expansion is a necessarily "lossy" graph transformation
> when multi-token (input or output) synonyms are applied, because the
> index does not store {{posLength}}, so there will always be phrase
> queries that should match but do not, and then phrase queries that
> should not match but do.
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> goes into detail about this.
> However, with this new SynonymGraphFilter, if instead you do synonym
> expansion at query time (and don't do the flattening), and you use
> TermAutomatonQuery (future: somehow integrated into a query parser),
> or maybe just "enumerate all paths and make union of PhraseQuery", you
> should get 100% correct matches (not sure about "proper" scoring
> though...).
> This new syn filter still cannot consume an arbitrary graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6664) Replace SynonymFilter with SynonymGraphFilter

Reply via email to