[
https://issues.apache.org/jira/browse/LUCENE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jim Ferenczi reassigned LUCENE-8137:
------------------------------------
Assignee: Jim Ferenczi
> GraphTokenStreamFiniteStrings does not handle position inc > 1 in multi-word
> synoyms
> ------------------------------------------------------------------------------------
>
> Key: LUCENE-8137
> URL: https://issues.apache.org/jira/browse/LUCENE-8137
> Project: Lucene - Core
> Issue Type: Bug
> Affects Versions: master (8.0), 7.2.1
> Reporter: Jim Ferenczi
> Assignee: Jim Ferenczi
> Priority: Major
>
> The automaton built for graph queries that contain multiple multi-word
> synonyms does not handle gaps if they appear in the middle of a multi-word
> synonym. In such case the token next to the gap is considered as part of the
> multi-word synonym.
> Stop words that appear before or after multi-word synonyms are handled
> correctly in the current version but the synonym rule "part of speech, pos"
> for instance does not create the expected query if "of" is removed by a
> filter that is set after the synonym_graph. One solution would be to reuse
> TokenStreamToAutomaton (with minor changes to add the ability to create token
> transitions rather than chars) which preserves gaps (as a transition) in the
> produced automaton.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]