Why does SpellCheckCollator want to ignore tokens with incorrect offsets? On Fri, Feb 8, 2019 at 10:35 AM Alan Woodward (JIRA) <[email protected]> wrote:
> > [ > https://issues.apache.org/jira/browse/SOLR-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763700#comment-16763700 > ] > > Alan Woodward commented on SOLR-13233: > -------------------------------------- > > I'm honestly not sure what the correct fix here is - possibly we should > change WordDelimiterGraphFilter to emit its original token first? And > check our other TokenFilters to ensure that they all have this behaviour? > > > SpellCheckCollator ignores stacked tokens > > ----------------------------------------- > > > > Key: SOLR-13233 > > URL: https://issues.apache.org/jira/browse/SOLR-13233 > > Project: Solr > > Issue Type: Bug > > Security Level: Public(Default Security Level. Issues are Public) > > Reporter: Alan Woodward > > Priority: Major > > > > When building collations, SpellCheckCollator ignores any tokens with a > position increment of 0, assuming that they've been injected and may > therefore have incorrect offsets (injected terms generally keep the offsets > of the terms they're replacing, as they don't themselves appear anywhere in > the original source). However, this assumption is not necessarily correct > - for example, WordDelimiterGraphFilter emits stacked tokens *before* the > original token, because it needs to iterate through all stacked tokens to > correctly set the original token's position length. > > > > -- > This message was sent by Atlassian JIRA > (v7.6.3#76005) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
