[
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643659#comment-13643659
]
Jack Krupansky commented on LUCENE-4956:
----------------------------------------
As a user trying to browse and find analyzers and tokenizers for specific
languages, I object. I mean, I should be able to look at the language code and
guess what module it might be in. It's one thing if the module name is
reasonably general and there is a reasonable expectation that average users
would readily associate it with specific langauges, or to categorically group
languages, but just giving an artificial, non-obvious name to the module than
would not be obvious to an average user seems like a poor choice, to me.
Even if you just called the module "korean", at least that would be a helpful
guide to people like me browsing the list of modules. and then the package name
can distinguish the implementations for that language.
Also, it should be possible to mix multiple implementations for the same
langauge in the same application, so, the package name does not to have some
unique name, unless there is guaranteed to be only one implementation for that
language.
I would suggest that there should be two choices for language-based analysis
modules:
1. Category name, where there is some general approach that covers a number of
langauges and need to share classes.
2. Language code, hyphen, some arbitrary name for implementations that cover
only a single language.
Even for #1, I would suggest that there should be a prefix that covers the
"type" of languages covered (eastern european, asian, etc.)
That said, I would not stand in the way of adding Korean analysis as soon as
possible. I mean, this contribution shouldn't have to correct all of the sins
of past contributions.
> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/analysis
> Affects Versions: 4.2
> Reporter: SooMyung Lee
> Labels: newbie
> Attachments: kr.analyzer.4x.tar
>
>
> Korean language has specific characteristic. When developing search service
> with lucene & solr in korean, there are some problems in searching and
> indexing. The korean analyer solved the problems with a korean morphological
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene
> and solr. If you develop a search service with lucene in korean, It is the
> best idea to choose the korean analyzer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]