[ 
https://issues.apache.org/jira/browse/LUCENE-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12501245
 ] 

Steven Rowe commented on LUCENE-902:
------------------------------------

Hi Toru,

I looked at your patch (though I didn't test it), and I noticed that it uses 
generics and varargs, both Java 1.5 features.  Lucene core targets Java 1.4, so 
your patch needs to be rewritten to use only Java 1.4 features.

I think I understand what you're going for (filtering out all tokens at the 
same position as a stopword), and I think it's a useful addition to Lucene, 
since the naive "fix", i.e. employing a StopFilter in a processing pipeline 
before a morphological analyzer, will negatively impact the morphological 
analyzer's performance.  

However, this behavior should not be the default - StopFilter's current 
behavior is well-defined and depended on by lots of people.  I think there are 
(at least :) ) two possible courses of action here:

1. Include a getter/setter for a boolean field controlling whether to filter 
out tokens at the same position as stopwords (call it, say,  
"removeStopwordCollocates", where I mean "collocate", as a noun, to denote 
tokens with the same position).  This field would be initialized to false, to 
preserve existing behavior.

2. Change StopFilter to allow extension (remove the "final" in "public final 
class StopFilter ..."), and create a new class extending StopFilter that 
exhibits the behavior you want.  This could start life in the sandbox.

I like option #1 better - this functionality, were it available, would quite 
likely be useful to a significat proportion of Lucene's user base (albeit 
skewed toward non-Lucene-as-black-box users).


> Check on PositionIncrement  with StopFilter.
> --------------------------------------------
>
>                 Key: LUCENE-902
>                 URL: https://issues.apache.org/jira/browse/LUCENE-902
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.2
>            Reporter: Toru Matsuzawa
>         Attachments: stopfilter.patch, stopfilter20070604.patch
>
>
> PositionIncrement set with Tokenizer is not considered with StopFilter. 
> When PositionIncrement of Token is 1, it is deleted by StopFilter. However, 
> when PositionIncrement of Token following afterwards is 0, it is not deleted. 
> I think that it is necessary to be deleted. Because it is thought same Token 
> when PositionIncrement is 0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to