[jira] [Commented] (LUCENE-8361) Make TestRandomChains check that filters preserve positions

Robert Muir (JIRA) Fri, 29 Jun 2018 06:19:11 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527615#comment-16527615
 ]


Robert Muir commented on LUCENE-8361:
-------------------------------------

{quote}
They're designed to short-cut tokenization (mainly for highlighting, I think) - 
do we have a non-buggy way of not consuming all tokens? Because I can see that 
it's a valid thing to do in some circumstances.
{quote}

Yes, these filters have a boolean option to do this correctly. Its just not the 
default. This is really too bad, since somehow these bugs (which are like 
implementation details of how particular highlighters worked) made their way 
into the analysis module in such a way that its easy to put wrong offsets into 
your index.

> Make TestRandomChains check that filters preserve positions
> -----------------------------------------------------------
>
>                 Key: LUCENE-8361
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8361
>             Project: Lucene - Core
>          Issue Type: Test
>            Reporter: Adrien Grand
>            Assignee: Alan Woodward
>            Priority: Minor
>         Attachments: LUCENE-8361.patch
>
>
> Follow-up of LUCENE-8360: it is a bit disappointing that we only found this 
> issue because of a newly introduced token filter. I'm wondering that we might 
> be able to make TestRandomChains detect more bugs by verifying that the sum 
> of position increments is preserved through the whole analysis chain.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8361) Make TestRandomChains check that filters preserve positions

Reply via email to