[ 
https://issues.apache.org/jira/browse/LUCENE-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684996#comment-16684996
 ] 

Alan Woodward commented on LUCENE-8564:
---------------------------------------

Here is a patch adding a GraphTokenStream class.  The class wraps an underlying 
token stream, and then exposes tokens via the following methods:
- incrementBaseToken() : moves the starting point of the graph forwards
- incrementGraphToken() : moves along the currently selected path through the 
token graph
- incrementGraph() : resets back to the base token, and selects the next path 
to move along.  Returns false if all paths have been exhausted

The patch also reimplements FixedShingleFilter using GraphTokenStream, to 
illustrate how much easier it is to reason about how things work.

To protect against misuse, there are hard limits on how far ahead in the stream 
tokens will be read and cached, and the number of paths through the graph that 
can be followed from a single base token

> Make it easier to iterate over graphs in tokenstreams
> -----------------------------------------------------
>
>                 Key: LUCENE-8564
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8564
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8564.patch
>
>
> We have a number of TokenFilters that read ahead in the token stream (eg 
> synonyms, shingles) and ideally these would understand token graphs as well 
> as linear streams.  FixedShingleFilter already has some mechanisms to deal 
> with graphs; this issue is to extract this logic into a GraphTokenStream 
> class that can then be reused by other token filters



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to