[jira] [Commented] (LUCENE-4065) FilteringTokenFilter should never corrupt the tokenstream graph

Robert Muir (JIRA) Wed, 21 Feb 2018 13:53:50 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372072#comment-16372072
 ]


Robert Muir commented on LUCENE-4065:
-------------------------------------

Yeah, you've got it. I really prefer your {{enableGaps}} name. Sorry, the issue 
is just confusing and I was struggling to try to explain it.

Today {{enableGaps}} is always true, which makes deletions pretty simple for 
FilteringTokenFilter. We just have to track an int variable! 

But I think we can potentially support enableGaps=false, and adjust 
positionIncrements/positionLengths so that the result is sane. That's the idea 
of this issue. I think no user _really_ wanted to disable position increments 
entirely before, nobody wants to move synonyms to the incorrect words or 
anything like that. They just want control over whether there are gaps or not: 
it impacts things like phrase queries.

> FilteringTokenFilter should never corrupt the tokenstream graph
> ---------------------------------------------------------------
>
>                 Key: LUCENE-4065
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4065
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Priority: Major
>         Attachments: LUCENE-4065_test.patch
>
>
> Currently removers like stopfilter have an option (true/false) to enable 
> position increments.
> If its true: it both inserts gaps where necessary AND propagates gaps down 
> the stream.
> If its false: it does neither, which can totally mess up the tokenstream 
> graph (e.g. move synonyms to another word).
> There are totally valid natural usecases for false, where you don't want gaps 
> because you want phrasequeries to act as if the word was never actually there.
> But 'not inserting gaps' is separate from proper propagation of existing gaps.
> So I think we should provide an option (either fix 'false' or make it an 
> enum), where you still get a legit tokenstream and dont totally screw it up, 
> but you simply omit gaps.
> See LUCENE-3848 for more information (Where we at least fixed this case to 
> not begin the tokenstream with posinc=0)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4065) FilteringTokenFilter should never corrupt the tokenstream graph

Reply via email to