[ https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798319#comment-13798319 ]
SooMyung Lee commented on LUCENE-4956: -------------------------------------- Hi, all. I' going to explain how I develop this code as Christian recommended because of license and legal problem that [~jkrupan] mentioned in previous comment. I started to write this code and dictionary in 2006 based on a book which author is Seung-Shik, Kang who is a professor of Kookmin university now. the dictionary consist of several files but major files are total.dic, josa.dic, eomi.dic and syllable.dic. in first step of developing dictionary, I collected basic stem words for total.dic and particles for josa.dic and eomi.dic from book and various websites. and then I surveyed how basic stem words can be used on online dictionaries. and I only referred to the book to make syllable.dic. the rest of files is created by myself during developing except for mapHanja.dic. I added this file two years ago. I'm not sure that this file has not legal problem because many data came from projects result so it is better to remove that data. to make source code, I referred to the book so major logic was based on the book except for some utilities classes such as String, File and Trie.java. I copied most of utilities classes from apache common project but Trie.java from other website. I cannot remember the exact website now because it was happend long time ago. but I remember that I read the license that was Apache license. I finished first version in 2008 and created an online community on a website (called Naver) and uploaded the source code. the number of community members are over 3700 currently. I attended an opensource contest held by Korean government organization in 2009. During the contest, I uploaded the source code to the Sourceforge and got a BlackDuck license test with this code and passed the test. I have supported users through the online community (http://cafe.naver.com/korlucene). so some users improved dictionaries and source codes and then posted it on the website. and I merged it and opened it again. This is the wohle process how I developed the code. If anybody has something to recommend, Please let me know it. > the korean analyzer that has a korean morphological analyzer and dictionaries > ----------------------------------------------------------------------------- > > Key: LUCENE-4956 > URL: https://issues.apache.org/jira/browse/LUCENE-4956 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis > Affects Versions: 4.2 > Reporter: SooMyung Lee > Assignee: Christian Moen > Labels: newbie > Attachments: eval.patch, kr.analyzer.4x.tar, lucene-4956.patch, > lucene4956.patch, LUCENE-4956.patch > > > Korean language has specific characteristic. When developing search service > with lucene & solr in korean, there are some problems in searching and > indexing. The korean analyer solved the problems with a korean morphological > anlyzer. It consists of a korean morphological analyzer, dictionaries, a > korean tokenizer and a korean filter. The korean anlyzer is made for lucene > and solr. If you develop a search service with lucene in korean, It is the > best idea to choose the korean analyzer. -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org