[ https://issues.apache.org/jira/browse/LUCENE-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785923#action_12785923 ]
Simon Willnauer commented on LUCENE-2034: ----------------------------------------- bq. Im not sure about this. I think this is a partially true statement. I know I could look it up to be sure. I thought that the JLS required all static initializers to be run at first access to the class. So if one does not want the list of default stopwords, but wants something else in the class or is supplying an alternate set of stopwords, the default stopwords are initialized anyway. DM, What you say its true but the holder is a static inner class and its static initializers run on the first access. That is right when it needs to be as it is only accessed once you the default stopwords. It does not require any synchronization as this is guaranteed by the JVM. What I like about it is that you can't introduce any synch. problems - simple and declarative. bq. So the other benefit is that it is fully lazy. Though this is a small benefit. see above bq. It could be made into a singleton (which would have been better in the first place), or static or both. I just tossed together one example, though extensive, to answer. Also, the matchVersion is not needed in the derived classes. It already is a singleton. the holder makes it a lazy loaded static final singleton. MatchVersion will only be needed in derived classes if the tokenStreamComponents I personally don't like the various different ways you can load stopwords either, my approach is a different one. Stopwords are mainly used in analyzers / filters, we have a standard way to load them in StopawareAnalyzer if you implement your analyzer. If you use the analyzer you should use WordlistLoader. If we fix WordlistLoader to return Set<?> we are good to go with a single way for the user and a standard way for makeing a stopaware analyzer. If you wrap this up in a Class StopWords then people do not know what to do with it once they wanna load a Stem-Exclusion Table. Maybe I miss one important thing but I do not see the benefit of wrapping a Set<?> into another class. - If so please explain. :) Thanks > Massive Code Duplication in Contrib Analyzers - unifly the analyzer ctors > ------------------------------------------------------------------------- > > Key: LUCENE-2034 > URL: https://issues.apache.org/jira/browse/LUCENE-2034 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Affects Versions: 2.9 > Reporter: Simon Willnauer > Assignee: Robert Muir > Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2034,patch, LUCENE-2034,patch, LUCENE-2034.patch, > LUCENE-2034.patch, LUCENE-2034.patch, LUCENE-2034.patch, LUCENE-2034.patch, > LUCENE-2034.txt > > > Due to the variouse tokenStream APIs we had in lucene analyzer subclasses > need to implement at least one of the methodes returning a tokenStream. When > you look at the code it appears to be almost identical if both are > implemented in the same analyzer. Each analyzer defnes the same inner class > (SavedStreams) which is unnecessary. > In contrib almost every analyzer uses stopwords and each of them creates his > own way of loading them or defines a large number of ctors to load stopwords > from a file, set, arrays etc.. those ctors should be removed / deprecated and > eventually removed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org