[ https://issues.apache.org/jira/browse/LUCENE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784812#action_12784812 ]
DM Smith commented on LUCENE-2034: ---------------------------------- bq. But I do not see the benefit compared to the current solution. In an earlier post we discussed that it'd be possible, like SOLR, to eliminate analyzers for a factory pattern. The benefit of this variation (you are right, it is equivalent) is that it moves in that direction. .bq To access a default stopword set you have to create an instance of a specific analyzer which is IMO not a very natural way. It could be made into a singleton (which would have been better in the first place), or static or both. I just tossed together one example, though extensive, to answer. Also, the matchVersion is not needed in the derived classes. So here is an alternate: {code} public class ArabicStopWords extends StopWords { private static final StopWords instance = new ArabicStopWords(); private ArabicStopWords() { super(Version.LUCENE_30, null, null, false); } public static Set<?> getDefaultStopWords() { return instance.getDefaultStopWords(); } } {code} bq. I personally prefer the holder pattern as it is guaranteed to be lazy by the JVM. I'm not sure about this. I think this is a partially true statement. I know I could look it up to be sure. I thought that the JLS required *all* static initializers to be run at first access to the class. So if one does not want the list of default stopwords, but wants something else in the class or is supplying an alternate set of stopwords, the default stopwords are initialized anyway. So the other benefit is that it is fully lazy. Though this is a small benefit. On another note, still regarding code placement: StopFilter has a bunch of makeStopSet methods. WordListLoader has a few more. StopawareAnalyzer has another. My example has yet another. I think this creates confusion for end users and casual contributors as it is not clear how to proceed without looking at the code for examples. I'd like to see some kind of clarity/consolidation. > Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors > ------------------------------------------------------------------------- > > Key: LUCENE-2034 > URL: https://issues.apache.org/jira/browse/LUCENE-2034 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Affects Versions: 2.9 > Reporter: Simon Willnauer > Assignee: Robert Muir > Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2034,patch, LUCENE-2034,patch, LUCENE-2034.patch, > LUCENE-2034.patch, LUCENE-2034.patch, LUCENE-2034.patch, LUCENE-2034.patch, > LUCENE-2034.txt > > > Due to the variouse tokenStream APIs we had in lucene analyzer subclasses > need to implement at least one of the methodes returning a tokenStream. When > you look at the code it appears to be almost identical if both are > implemented in the same analyzer. Each analyzer defnes the same inner class > (SavedStreams) which is unnecessary. > In contrib almost every analyzer uses stopwords and each of them creates his > own way of loading them or defines a large number of ctors to load stopwords > from a file, set, arrays etc.. those ctors should be removed / deprecated and > eventually removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org