[
https://issues.apache.org/jira/browse/LUCENE-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365142#comment-17365142
]
Chris M. Hostetter commented on LUCENE-10008:
---------------------------------------------
{quote}Should we add a new base class common for {{Stop/KeepWord/CommonGrams}}
to parse these args ...
{quote}
yeah ... that's what that comment is suggesting: a new (abstract) base class
injected into the hierarchy that can be shared by those 3 concrete classes as a
common parent.
something like...
{code:java}
public abstract class AbstractWordsFileFilterFactory extends TokenFilterFactory
implements ResourceLoaderAware {
private CharArraySet words; // nocommit: also provide public accessor
private final String wordFiles; // nocommit: also provide public accessor
private final String format; // nocommit: also provide public accessor
private final boolean ignoreCase; // nocommit: also provide public accessor
// nocommit: jdocs
public AbstractWordsFileFilterFactory(Map<String, String> args) {
super(args);
wordFiles = get(args, "words");
format = get(args, "format");
ignoreCase = getBoolean(args, "ignoreCase", false);
}
// nocommit: jdocs
@Override
public void inform(ResourceLoader loader) throws IOException {
// nocommit: mostly verbatim from current StopFilterFactory
// nocommit: but replace direct use of ENGLISH_STOP_WORDS_SET in "default"
codepath with...
// ... } else { ...; return createDefaultWords(); }
}
// nocommit: jdocs
protected CharArraySet createDefaultWords() {
// nocommit: KeepWordFilterFactory should override this method to return
null
return new CharArraySet(EnglishAnalyzer.ENGLISH_STOP_WORDS_SET, ignoreCase)
}
}{code}
> CommonGramsFilterFactory doesn't respect ignoreCase=true when default
> stopwords are used
> ----------------------------------------------------------------------------------------
>
> Key: LUCENE-10008
> URL: https://issues.apache.org/jira/browse/LUCENE-10008
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Chris M. Hostetter
> Priority: Major
>
> CommonGramsFilterFactory's use of the "words" and "ignoreCase" config options
> is inconsistent with how StopFilterFactory uses them - leading to
> "ignoreCase=true" not being respected unless "words" is specified...
> StopFilterFactory...
> {code:java}
> public void inform(ResourceLoader loader) throws IOException {
> if (stopWordFiles != null) {
> ...
> } else {
> ...
> stopWords = new CharArraySet(EnglishAnalyzer.ENGLISH_STOP_WORDS_SET,
> ignoreCase);
> }
> }
> {code}
> CommonGramsFilterFactory...
> {code:java}
> @Override
> public void inform(ResourceLoader loader) throws IOException {
> if (commonWordFiles != null) {
> ...
> } else {
> commonWords = EnglishAnalyzer.ENGLISH_STOP_WORDS_SET;
> }
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]