[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

Jack Krupansky (JIRA) Sat, 27 Apr 2013 06:14:20 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643659#comment-13643659
 ]


Jack Krupansky commented on LUCENE-4956:
----------------------------------------

As a user trying to browse and find analyzers and tokenizers for specific 
languages, I object. I mean, I should be able to look at the language code and 
guess what module it might be in. It's one thing if the module name is 
reasonably general and there is a reasonable expectation that average users 
would readily associate it with specific langauges, or to categorically group 
languages, but just giving an artificial, non-obvious name to the module than 
would not be obvious to an average user seems like a poor choice, to me.

Even if you just called the module "korean", at least that would be a helpful 
guide to people like me browsing the list of modules. and then the package name 
can distinguish the implementations for that language.

Also, it should be possible to mix multiple implementations for the same 
langauge in the same application, so, the package name does not to have some 
unique name, unless there is guaranteed to be only one implementation for that 
language.

I would suggest that there should be two choices for language-based analysis 
modules:

1. Category name, where there is some general approach that covers a number of 
langauges and need to share classes.
2. Language code, hyphen, some arbitrary name for implementations that cover 
only a single language.

Even for #1, I would suggest that there should be a prefix that covers the 
"type" of languages covered (eastern european, asian, etc.)

That said, I would not stand in the way of adding Korean analysis as soon as 
possible. I mean, this contribution shouldn't have to correct all of the sins 
of past contributions.

                
> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4956
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4956
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.2
>            Reporter: SooMyung Lee
>              Labels: newbie
>         Attachments: kr.analyzer.4x.tar
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

Reply via email to