[ 
https://issues.apache.org/jira/browse/LUCENE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008305#comment-16008305
 ] 

Jim Ferenczi commented on LUCENE-7824:
--------------------------------------

I don't think we should try to optimize here. The number of terms should be 
small in a query so I would prefer to keep it simple and just create a new 
entry for each token like the cached token stream does.

> Multi-word synonyms rule with common terms at the same position are buggy
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-7824
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7824
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Jim Ferenczi
>         Attachments: LUCENE-7824.patch
>
>
> The automaton built from the graph token stream tries to pack common terms in 
> multi word synonyms that appear at the same position. This means that some 
> states inside a multi word synonym can have multiple transitions.
> As a result the intersection point of the graph are not computed correctly.
> For example the synonym rule: "ny, new york city, new york" is not applied 
> correctly to the query "ny police".
> In this case "police" is detected as part of the multi synonyms path and we 
> create the disjunction between:
>  "ny police", "new york police", ...
> I pushed a patch that removes this optim (and creates a single transition 
> from each state) in order to ensure that the intersection points of the graph 
> always showed up at the end of the multi synonym paths.
> [~mattweber] can you take a look ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to