[
https://issues.apache.org/jira/browse/LUCENE-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373757#comment-16373757
]
Robert Muir commented on LUCENE-4065:
-------------------------------------
Yeah, i mean we should split it up. Its probably more important to figure out
what this thing should be doing in the graph case (positionLengths). Because
before the stopfilter i've got a total of 3 positions (1 + 0 + 1 + 1). Today
the stopfilter deletes "the" and transfers the position to "twd", so i've still
got 3 positions (1 + 1 + 1).
But your testcase argues that this should be 4 positions (1 + 2 + 1). I'm just
not convinced thats the correct behavior: its unintuitive to me that a
stopfilter would make a document "longer" in the sense of actually adding
additional positions... (no, it doesn't impact length normalization because
this value isn't used for that, but its just really confusing).
> FilteringTokenFilter should never corrupt the tokenstream graph
> ---------------------------------------------------------------
>
> Key: LUCENE-4065
> URL: https://issues.apache.org/jira/browse/LUCENE-4065
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/analysis
> Reporter: Robert Muir
> Priority: Major
> Attachments: LUCENE-4065_test.patch
>
>
> Currently removers like stopfilter have an option (true/false) to enable
> position increments.
> If its true: it both inserts gaps where necessary AND propagates gaps down
> the stream.
> If its false: it does neither, which can totally mess up the tokenstream
> graph (e.g. move synonyms to another word).
> There are totally valid natural usecases for false, where you don't want gaps
> because you want phrasequeries to act as if the word was never actually there.
> But 'not inserting gaps' is separate from proper propagation of existing gaps.
> So I think we should provide an option (either fix 'false' or make it an
> enum), where you still get a legit tokenstream and dont totally screw it up,
> but you simply omit gaps.
> See LUCENE-3848 for more information (Where we at least fixed this case to
> not begin the tokenstream with posinc=0)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]