[
https://issues.apache.org/jira/browse/LUCENE-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888780#comment-16888780
]
ASF subversion and git services commented on LUCENE-8916:
---------------------------------------------------------
Commit 1eb2a26c6cc9346827a321c3f883f17ea94ea013 in lucene-solr's branch
refs/heads/branch_8x from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1eb2a26 ]
LUCENE-8916: GraphTokenStreamFiniteStrings preserves all attributes
> GraphTokenStreamFiniteStrings.FiniteStringsTokenStream does not play well
> with subsequent TokenFilters
> ------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-8916
> URL: https://issues.apache.org/jira/browse/LUCENE-8916
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Time Spent: 50m
> Remaining Estimate: 0h
>
> GraphTokenStreamFiniteStrings provides a view over multiple paths through a
> Token graph, which is useful when building queries over multiple length
> synonyms. This view is exposed as an iterator over simple TokenStreams.
> However, these TokenStreams do not work correctly when further wrapped in
> token filters, because they do not use a CharTermAttribute.
> For an example of issues this can cause, see
> https://github.com/elastic/elasticsearch/issues/43976, where elasticsearch
> uses a special shingle field to speed up phrase searches. Queries are
> converted to shingles if they have multiple terms. However, if the query
> resolves into a graph due to synonyms, then this conversion breaks because
> the FixedShingleFilter is given a token stream built by GTSFS; terms are set
> using BytesTermAttribute, but then read using CharTermAttribute, and as these
> have different backing implementations, FSF ends up emitting null tokens.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]