Jim Ferenczi created LUCENE-7824:
------------------------------------

             Summary: Multi-word synonyms rule with common terms at the same 
position are buggy
                 Key: LUCENE-7824
                 URL: https://issues.apache.org/jira/browse/LUCENE-7824
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Jim Ferenczi


The automaton built from the graph token stream tries to pack common terms in 
multi word synonyms that appear at the same position. This means that some 
states inside a multi word synonym can have multiple transitions.
As a result the intersection point of the graph are not computed correctly.

For example the synonym rule: "ny, new york city, new york" is not applied 
correctly to the query "ny police".
In this case "police" is detected as part of the multi synonyms path and we 
create the disjunction between:
 "ny police", "new york police", ...

I pushed a patch that removes this optim (and creates a single transition from 
each state) in order to ensure that the intersection points of the graph always 
showed up at the end of the multi synonym paths.
[~mattweber] can you take a look ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to