[ 
https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733538#action_12733538
 ] 

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

Robert, I have looked at this patch and more important at the source itself and 
I get more and more the impression that we have to do more work on this 
analyzer and the related classes as just moving them into one package and make 
everything package private. From my understanding the Hidden Markov Model 
Segmenter is a feature which could be replaced by some other algorithm. Once 
you have such a feature relationship I would prefer packages by feature which 
enables you to remove a single feature just by removing a whole package. 
In other words I would love to see a general refactoring of the code which 
exploits a tiny but common API in the base package and is subsequently used by 
the HHMM "feature". There is quite a bit of work to do that I do not consider 
2.9 work. 
So here is the question, do we keep the structure as it is and just move it to 
a new subdir to build a sep. jar or do we move them into one single package (as 
you did in the patch) and build up a clean HHMM package  later in 3.*. 

Beside the packaging I found heaps of things I do not like very much in the 
code (not your patch :) an my fingertips getting nervous when I see stuff like 
the AbstractDictionary hierarchy or those Singletions. I would really like to 
have this separation of CN and common Analyzers in for 2.9 -- we just need to 
decide which way we go. I guess moving it over without changing code would be 
easiest.

simon


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer 
> jar to grow up to 3MB. The dictionary is quite big compared to all the other 
> resouces / class files contained in that jar. 
> Having a separate analyzer-cn contrib project enables footprint-sensitive 
> users (e.g. using lucene on a mobile phone) to include analyzer.jar without 
> getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small 
> refactoring as Robert mentioned in 
> [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several 
> classes should be package protected, members and classes could be final, 
> commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free 
> to move it to 3.0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to