[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733538#action_12733538 ]
Simon Willnauer commented on LUCENE-1728: ----------------------------------------- Robert, I have looked at this patch and more important at the source itself and I get more and more the impression that we have to do more work on this analyzer and the related classes as just moving them into one package and make everything package private. From my understanding the Hidden Markov Model Segmenter is a feature which could be replaced by some other algorithm. Once you have such a feature relationship I would prefer packages by feature which enables you to remove a single feature just by removing a whole package. In other words I would love to see a general refactoring of the code which exploits a tiny but common API in the base package and is subsequently used by the HHMM "feature". There is quite a bit of work to do that I do not consider 2.9 work. So here is the question, do we keep the structure as it is and just move it to a new subdir to build a sep. jar or do we move them into one single package (as you did in the patch) and build up a clean HHMM package later in 3.*. Beside the packaging I found heaps of things I do not like very much in the code (not your patch :) an my fingertips getting nervous when I see stuff like the AbstractDictionary hierarchy or those Singletions. I would really like to have this separation of CN and common Analyzers in for 2.9 -- we just need to decide which way we go. I guess moving it over without changing code would be easiest. simon > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer > jar to grow up to 3MB. The dictionary is quite big compared to all the other > resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive > users (e.g. using lucene on a mobile phone) to include analyzer.jar without > getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small > refactoring as Robert mentioned in > [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several > classes should be package protected, members and classes could be final, > commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free > to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org