[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-1728: -------------------------------- Attachment: LUCENE-1728.txt Simon, below is the method I used to do the refactoring with this patch. I know I am pressing the limits of what is a "refactoring" but in my opinion, this minor cleanup was necessary to prevent internal structures from being exposed: * Use of two Tokenizers in the same analyzer was confusing, WordTokenizer is now a TokenFilter. * Analyzer uses the standard WordListLoader rather than custom stuff. * Rather than force SmartChineseAnalyzer to keep track of internal heavyweight structures, it implements reusableTokenStream, etc. I added a few tests to ensure I didn't break anything in the SmartChineseAnalyzer. {noformat} ## 1. clean svn checkout ## 2. run the following commands to refactor the files. mkdir -p contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn contrib/analysis/smartcn/src/test/org/apache/lucene/analysis/cn contrib/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn svn add contrib/analysis svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/SmartChineseAnalyzer.java contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart/hhmm/* contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart/*.java contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn svn delete contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart svn move contrib/analyzers/src/test/org/apache/lucene/analysis/cn/TestSmartChineseAnalyzer.java contrib/analysis/smartcn/src/test/org/apache/lucene/analysis/cn svn move contrib/analyzers/src/resources/org/apache/lucene/analysis/cn/stopwords.txt contrib/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn svn move contrib/analyzers/src/resources/org/apache/lucene/analysis/cn/smart/hhmm/* contrib/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn svn delete contrib/analyzers/src/resources/org/apache/lucene/analysis/cn svn move contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/WordTokenizer.java contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/WordTokenFilter.java svn move contrib/analyzers contrib/analysis ## 3. eclipse "refresh" at project level. ## 4. set text-file encoding at project level to UTF-8 ## 5. manually force text-file encoding as UTF-8 for contrib/analysis/analyzers/src/java/org/apache/lucene/analysis/cn/package.html ## this is an existing encoding issue that is corrected by this patch. ## 6. apply patch from clipboard (you may now remove the above hack and you will notice this file is now detected properly as UTF-8) {noformat} > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1728.txt > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer > jar to grow up to 3MB. The dictionary is quite big compared to all the other > resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive > users (e.g. using lucene on a mobile phone) to include analyzer.jar without > getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small > refactoring as Robert mentioned in > [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several > classes should be package protected, members and classes could be final, > commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free > to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org