[ 
https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1728:
--------------------------------

    Attachment: LUCENE-1728.txt

Simon, here is the new patch. It also has the changes to build.xml and site.xml 
so that javadocs are correctly linked, and the regenerated docs.

{noformat}
## 1. clean svn checkout
## 2. run the following commands to refactor the files.

mkdir contrib/analyzers/common
mkdir -p contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn 
contrib/analyzers/smartcn/src/test/org/apache/lucene/analysis/cn 
contrib/analyzers/smartcn/src/resources/org/apache/lucene/analysis
svn add contrib/analyzers/smartcn contrib/analyzers/common
svn move 
contrib/analyzers/src/java/org/apache/lucene/analysis/cn/SmartChineseAnalyzer.java
 contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn
svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart 
contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn
svn move 
contrib/analyzers/src/test/org/apache/lucene/analysis/cn/TestSmartChineseAnalyzer.java
 contrib/analyzers/smartcn/src/test/org/apache/lucene/analysis/cn
svn move contrib/analyzers/src/resources/org/apache/lucene/analysis/cn 
contrib/analyzers/smartcn/src/resources/org/apache/lucene/analysis
svn copy contrib/analyzers/build.xml contrib/analyzers/common
svn move contrib/analyzers/pom.xml.template contrib/analyzers/common
svn move contrib/analyzers/src contrib/analyzers/common
svn move 
contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/smart/WordTokenizer.java
 
contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/smart/WordTokenFilter.java

## 3. eclipse "refresh" at project level.
## 4. set text-file encoding at project level to UTF-8
## 5. manually force text-file encoding as UTF-8 for 
contrib/analyzers/common/src/java/org/apache/lucene/analysis/cn/package.html
##   also manually force text-file encoding as UTF-8 for 
contrib/analyzers/common/src/java/org/apache/lucene/analysis/cjk/package.html
##   this is an existing encoding issue that is corrected by this patch.
## 6. apply patch from clipboard (you may now remove the above hack and you 
will notice the above files are now detected properly as UTF-8)
{noformat}

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt, 
> LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer 
> jar to grow up to 3MB. The dictionary is quite big compared to all the other 
> resouces / class files contained in that jar. 
> Having a separate analyzer-cn contrib project enables footprint-sensitive 
> users (e.g. using lucene on a mobile phone) to include analyzer.jar without 
> getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small 
> refactoring as Robert mentioned in 
> [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several 
> classes should be package protected, members and classes could be final, 
> commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free 
> to move it to 3.0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to