[
https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733538#action_12733538
]
Simon Willnauer commented on LUCENE-1728:
-----------------------------------------
Robert, I have looked at this patch and more important at the source itself and
I get more and more the impression that we have to do more work on this
analyzer and the related classes as just moving them into one package and make
everything package private. From my understanding the Hidden Markov Model
Segmenter is a feature which could be replaced by some other algorithm. Once
you have such a feature relationship I would prefer packages by feature which
enables you to remove a single feature just by removing a whole package.
In other words I would love to see a general refactoring of the code which
exploits a tiny but common API in the base package and is subsequently used by
the HHMM "feature". There is quite a bit of work to do that I do not consider
2.9 work.
So here is the question, do we keep the structure as it is and just move it to
a new subdir to build a sep. jar or do we move them into one single package (as
you did in the patch) and build up a clean HHMM package later in 3.*.
Beside the packaging I found heaps of things I do not like very much in the
code (not your patch :) an my fingertips getting nervous when I see stuff like
the AbstractDictionary hierarchy or those Singletions. I would really like to
have this separation of CN and common Analyzers in for 2.9 -- we just need to
decide which way we go. I guess moving it over without changing code would be
easiest.
simon
> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
> Key: LUCENE-1728
> URL: https://issues.apache.org/jira/browse/LUCENE-1728
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/analyzers
> Reporter: Simon Willnauer
> Assignee: Simon Willnauer
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on a large dictionary that causes the analyzer
> jar to grow up to 3MB. The dictionary is quite big compared to all the other
> resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive
> users (e.g. using lucene on a mobile phone) to include analyzer.jar without
> getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small
> refactoring as Robert mentioned in
> [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several
> classes should be package protected, members and classes could be final,
> commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free
> to move it to 3.0
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]