Why does SpellCheckCollator want to ignore tokens with incorrect offsets?

On Fri, Feb 8, 2019 at 10:35 AM Alan Woodward (JIRA) <[email protected]>
wrote:

>
>     [
> https://issues.apache.org/jira/browse/SOLR-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763700#comment-16763700
> ]
>
> Alan Woodward commented on SOLR-13233:
> --------------------------------------
>
> I'm honestly not sure what the correct fix here is - possibly we should
> change WordDelimiterGraphFilter to emit its original token first?  And
> check our other TokenFilters to ensure that they all have this behaviour?
>
> > SpellCheckCollator ignores stacked tokens
> > -----------------------------------------
> >
> >                 Key: SOLR-13233
> >                 URL: https://issues.apache.org/jira/browse/SOLR-13233
> >             Project: Solr
> >          Issue Type: Bug
> >      Security Level: Public(Default Security Level. Issues are Public)
> >            Reporter: Alan Woodward
> >            Priority: Major
> >
> > When building collations, SpellCheckCollator ignores any tokens with a
> position increment of 0, assuming that they've been injected and may
> therefore have incorrect offsets (injected terms generally keep the offsets
> of the terms they're replacing, as they don't themselves appear anywhere in
> the original source).  However, this assumption is not necessarily correct
> - for example, WordDelimiterGraphFilter emits stacked tokens *before* the
> original token, because it needs to iterate through all stacked tokens to
> correctly set the original token's position length.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to