[ 
https://issues.apache.org/jira/browse/LUCENE-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888781#comment-16888781
 ] 

ASF subversion and git services commented on LUCENE-8916:
---------------------------------------------------------

Commit 1ccef967677d4eeab4c162b7c0d6eeb81ebd5281 in lucene-solr's branch 
refs/heads/master from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1ccef96 ]

LUCENE-8916: GraphTokenStreamFiniteStrings preserves all attributes


> GraphTokenStreamFiniteStrings.FiniteStringsTokenStream does not play well 
> with subsequent TokenFilters
> ------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8916
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8916
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> GraphTokenStreamFiniteStrings provides a view over multiple paths through a 
> Token graph, which is useful when building queries over multiple length 
> synonyms.  This view is exposed as an iterator over simple TokenStreams.  
> However, these TokenStreams do not work correctly when further wrapped in 
> token filters, because they do not use a CharTermAttribute.
> For an example of issues this can cause, see 
> https://github.com/elastic/elasticsearch/issues/43976, where elasticsearch 
> uses a special shingle field to speed up phrase searches.  Queries are 
> converted to shingles if they have multiple terms. However, if the query 
> resolves into a graph due to synonyms, then this conversion breaks because 
> the FixedShingleFilter is given a token stream built by GTSFS; terms are set 
> using BytesTermAttribute, but then read using CharTermAttribute, and as these 
> have different backing implementations, FSF ends up emitting null tokens.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to