[
https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1728:
--------------------------------
Attachment: LUCENE-1728.txt
Simon, below is the method I used to do the refactoring with this patch.
I know I am pressing the limits of what is a "refactoring" but in my opinion,
this minor cleanup was necessary to prevent internal structures from being
exposed:
* Use of two Tokenizers in the same analyzer was confusing, WordTokenizer is
now a TokenFilter.
* Analyzer uses the standard WordListLoader rather than custom stuff.
* Rather than force SmartChineseAnalyzer to keep track of internal heavyweight
structures, it implements reusableTokenStream, etc.
I added a few tests to ensure I didn't break anything in the
SmartChineseAnalyzer.
{noformat}
## 1. clean svn checkout
## 2. run the following commands to refactor the files.
mkdir -p contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn
contrib/analysis/smartcn/src/test/org/apache/lucene/analysis/cn
contrib/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn
svn add contrib/analysis
svn move
contrib/analyzers/src/java/org/apache/lucene/analysis/cn/SmartChineseAnalyzer.java
contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn
svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart/hhmm/*
contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn
svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart/*.java
contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn
svn delete contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart
svn move
contrib/analyzers/src/test/org/apache/lucene/analysis/cn/TestSmartChineseAnalyzer.java
contrib/analysis/smartcn/src/test/org/apache/lucene/analysis/cn
svn move
contrib/analyzers/src/resources/org/apache/lucene/analysis/cn/stopwords.txt
contrib/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn
svn move
contrib/analyzers/src/resources/org/apache/lucene/analysis/cn/smart/hhmm/*
contrib/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn
svn delete contrib/analyzers/src/resources/org/apache/lucene/analysis/cn
svn move
contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/WordTokenizer.java
contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/WordTokenFilter.java
svn move contrib/analyzers contrib/analysis
## 3. eclipse "refresh" at project level.
## 4. set text-file encoding at project level to UTF-8
## 5. manually force text-file encoding as UTF-8 for
contrib/analysis/analyzers/src/java/org/apache/lucene/analysis/cn/package.html
## this is an existing encoding issue that is corrected by this patch.
## 6. apply patch from clipboard (you may now remove the above hack and you
will notice this file is now detected properly as UTF-8)
{noformat}
> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer
> jar to grow up to 3MB. The dictionary is quite big compared to all the other
> resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive
> users (e.g. using lucene on a mobile phone) to include analyzer.jar without
> getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small
> refactoring as Robert mentioned in
> [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several
> classes should be package protected, members and classes could be final,
> commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free
> to move it to 3.0
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]