[ https://issues.apache.org/jira/browse/LUCENE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803320#comment-16803320 ]
Nicolás Lichtmaier commented on LUCENE-8137: -------------------------------------------- I'm hitting this issue while trying to implement synonyms and stop-words at the same time, yet this is duplicate to a resolved issue ( SOLR-11968 ). This issue makes the combination of synonyms and stopwords break, if there's a workaround it would be great to have it in the bug. Thanks. > GraphTokenStreamFiniteStrings does not handle position inc > 1 in multi-word > synoyms > ------------------------------------------------------------------------------------ > > Key: LUCENE-8137 > URL: https://issues.apache.org/jira/browse/LUCENE-8137 > Project: Lucene - Core > Issue Type: Bug > Affects Versions: 7.2.1, 8.0 > Reporter: Jim Ferenczi > Assignee: Jim Ferenczi > Priority: Major > > The automaton built for graph queries that contain multiple multi-word > synonyms does not handle gaps if they appear in the middle of a multi-word > synonym. In such case the token next to the gap is considered as part of the > multi-word synonym. > Stop words that appear before or after multi-word synonyms are handled > correctly in the current version but the synonym rule "part of speech, pos" > for instance does not create the expected query if "of" is removed by a > filter that is set after the synonym_graph. One solution would be to reuse > TokenStreamToAutomaton (with minor changes to add the ability to create token > transitions rather than chars) which preserves gaps (as a transition) in the > produced automaton. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org