[ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895754#comment-13895754
 ] 

SooMyung Lee commented on LUCENE-4956:
--------------------------------------

Hi [~daedeqi],
I created SourceForge project and contribute it to Apache Lucene project.
I'm working on fixing some problems mentioned in this Jira issue. 
But the problem you mentioned about "위키백과는" is to be solved if you add the word 
"위키" into the dictionary.
There is no perfect way to analyze Korean sentences according to only a 
algorithm such as Porter Stemming in English sentence. So, we use both 
algorithm and dictionary to analyze Korean sentence. The dictionary for the 
Korean analyzer has around 40,000 words. I think most of the basic Korean word 
is included in it. But about many loan words such as "위키" is not included. 
Usually the user of Korean Analyzer in Korea builds his own dictionary for his 
purpose.

> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4956
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4956
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.2
>            Reporter: SooMyung Lee
>            Assignee: Christian Moen
>              Labels: newbie
>         Attachments: LUCENE-4956.patch, eval.patch, kr.analyzer.4x.tar, 
> lucene-4956.patch, lucene4956.patch
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to