[
https://issues.apache.org/jira/browse/LUCENE-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501245
]
Steven Rowe commented on LUCENE-902:
------------------------------------
Hi Toru,
I looked at your patch (though I didn't test it), and I noticed that it uses
generics and varargs, both Java 1.5 features. Lucene core targets Java 1.4, so
your patch needs to be rewritten to use only Java 1.4 features.
I think I understand what you're going for (filtering out all tokens at the
same position as a stopword), and I think it's a useful addition to Lucene,
since the naive "fix", i.e. employing a StopFilter in a processing pipeline
before a morphological analyzer, will negatively impact the morphological
analyzer's performance.
However, this behavior should not be the default - StopFilter's current
behavior is well-defined and depended on by lots of people. I think there are
(at least :) ) two possible courses of action here:
1. Include a getter/setter for a boolean field controlling whether to filter
out tokens at the same position as stopwords (call it, say,
"removeStopwordCollocates", where I mean "collocate", as a noun, to denote
tokens with the same position). This field would be initialized to false, to
preserve existing behavior.
2. Change StopFilter to allow extension (remove the "final" in "public final
class StopFilter ..."), and create a new class extending StopFilter that
exhibits the behavior you want. This could start life in the sandbox.
I like option #1 better - this functionality, were it available, would quite
likely be useful to a significat proportion of Lucene's user base (albeit
skewed toward non-Lucene-as-black-box users).
> Check on PositionIncrement with StopFilter.
> --------------------------------------------
>
> Key: LUCENE-902
> URL: https://issues.apache.org/jira/browse/LUCENE-902
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 2.2
> Reporter: Toru Matsuzawa
> Attachments: stopfilter.patch, stopfilter20070604.patch
>
>
> PositionIncrement set with Tokenizer is not considered with StopFilter.
> When PositionIncrement of Token is 1, it is deleted by StopFilter. However,
> when PositionIncrement of Token following afterwards is 0, it is not deleted.
> I think that it is necessary to be deleted. Because it is thought same Token
> when PositionIncrement is 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]