DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=29756>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=29756 analyzer refactoring based on CVS HEAD from 6/21/2004 Summary: analyzer refactoring based on CVS HEAD from 6/21/2004 Product: Lucene Version: CVS Nightly - Specify date in submission Platform: All OS/Version: All Status: NEW Severity: Enhancement Priority: Other Component: Analysis AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Hello, As mentioned in previous exchanges, notably with Grant Ingersoll, I added some new classes to the "analysis" package to meet the requirements of the feature request in Bugzilla (http://issues.apache.org/bugzilla/show_bug.cgi?id=28182) and did some refactoring while I was under-the-hood. This is an overview of the hierarchies per my changes: -Analyzer --CustomAnalyzer (new abstract class largely based on Grant's BaseAnalyzer) -- AbstractAnalyzer (new abstract class) ---RussianAnalyzer ---GermanAnalyzer --- etc. -Tokenizer --CloneableTokenizer (new abstract class) ---StandardTokenizer ---CharTokenizer ---CJKTokenizer ---etc. -TokenFilter --CloneableTokenFilter (new abstract class) ---AbstractStemFilter (new abstract class) ----RussianStemFilter ----GermanStemFilter ----etc. -Stemmer (very simple new interface used in AbstractStemFilter) -- PorterStemmer --RussianStemmer --etc. In the attached zip file there are 3 diff files (core.analysis, sandbox.analysis, and sandbox.analysis.snowball) and a zip containing the new classes for org.apache.lucene.analysis in the lucene core. I tried to minimize the irrelevant code changes (e.g. style, spaces, etc.) in the diffs while conforming to the code formatting guidelines outlined by Otis. I think there were a number of classes in the "analysis" package that didn't conform so these diffs may have a lot of noise as I reformatted those classe with my IDE, sorry :( . If the diffs are too painful then let me know and I'll try to prune them. If there is a TODO list specific to Analyzers, are the below items on that list? 1) move German and Russian packages to sandbox (I think this is on the Lucene TODO list) 2) Analyzer class renaming such that dynamic configuration could return classes like Analyzer_ru, Analyzer_de, Analyzer_fr, etc. based on the class naming scheme "Analyzer_{Locale.toString}" 3) Documentation Question, comments, feedback, criticisms are all welcome...... Regards, RBP PS - Thanks Grant! --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
