[ https://issues.apache.org/jira/browse/LUCENE-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685878#comment-16685878 ]
Alan Woodward commented on LUCENE-8564: --------------------------------------- bq. How does it handle a graph where one of the side paths itself then splits (after a token or two) into its own set of side paths? We'd end up with extra routes through the graph available via incrementGraph() Let's imagine a TokenStream that looks like this: z a/b:4 c d/e:2 f g h Starting at position z, calling incrementGraphToken() repeatedly will yield the tokenstream z a c d f g h Then we call incrementGraph(); now calling incrementGraphToken() gives us z a c e g h, following the split at d/e Call incrementGraph() again; we get z b g h Now that all routes have been exhausted, calling incrementGraph() will return false. How many routes are available depends on how far down the graph you move; if in the example above you only advance as far as 'z a c' on the first branch, then incrementGraph() will move directly to the 'a b g' branch. > Make it easier to iterate over graphs in tokenstreams > ----------------------------------------------------- > > Key: LUCENE-8564 > URL: https://issues.apache.org/jira/browse/LUCENE-8564 > Project: Lucene - Core > Issue Type: Task > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Major > Attachments: LUCENE-8564.patch > > > We have a number of TokenFilters that read ahead in the token stream (eg > synonyms, shingles) and ideally these would understand token graphs as well > as linear streams. FixedShingleFilter already has some mechanisms to deal > with graphs; this issue is to extract this logic into a GraphTokenStream > class that can then be reused by other token filters -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org