[
https://issues.apache.org/jira/browse/LUCENE-8361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527615#comment-16527615
]
Robert Muir commented on LUCENE-8361:
-------------------------------------
{quote}
They're designed to short-cut tokenization (mainly for highlighting, I think) -
do we have a non-buggy way of not consuming all tokens? Because I can see that
it's a valid thing to do in some circumstances.
{quote}
Yes, these filters have a boolean option to do this correctly. Its just not the
default. This is really too bad, since somehow these bugs (which are like
implementation details of how particular highlighters worked) made their way
into the analysis module in such a way that its easy to put wrong offsets into
your index.
> Make TestRandomChains check that filters preserve positions
> -----------------------------------------------------------
>
> Key: LUCENE-8361
> URL: https://issues.apache.org/jira/browse/LUCENE-8361
> Project: Lucene - Core
> Issue Type: Test
> Reporter: Adrien Grand
> Assignee: Alan Woodward
> Priority: Minor
> Attachments: LUCENE-8361.patch
>
>
> Follow-up of LUCENE-8360: it is a bit disappointing that we only found this
> issue because of a newly introduced token filter. I'm wondering that we might
> be able to make TestRandomChains detect more bugs by verifying that the sum
> of position increments is preserved through the whole analysis chain.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]