[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

Jack Krupansky (JIRA) Sun, 28 Apr 2013 11:04:17 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644076#comment-13644076
 ]


Jack Krupansky commented on LUCENE-4956:
----------------------------------------

Looking at the actual tar file, I notice that it has the factory classes placed 
in "solr" directories rather than in the lucene directories as factories are 
normally organized.

By all means proceed with producing a normal patch that shows the final 
organization of this new analysis package.

Some other issues:

1. Complete absence of Java doc for the tokenizer factory and token filter 
factory classes - it is not "Solr user-ready" at present. There should be an 
XML example of a token filter with the parameters, as is the usual practice in 
Lucene/Solr.

2. No Apache license headers in the "Solr" code. I thought this stuff was 
already supposed to be ASL 2.0?

3. No Solr schema.xml change to add the text_ko field type.

4. At least the KoreanAnalyzer.java and KoreanTokenizer.java source code have 
tab characters - odd format. Need to be normalized for Lucene project 
conventions.

5. There is a hardwired stop word list in KoreanAnalyzer that appears to be 
nearly identical or close to StopAnalyzer.ENGLISH_STOP_WORDS_SET. Why doesn't 
that static code copy the StopAnalyzer list and then add the few extra terms 
that are needed? If there is a reason, place it in a comment.

But as I said, by all means proceed to a normal patch file now that the tar 
contribution is "legal".

                
> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4956
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4956
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.2
>            Reporter: SooMyung Lee
>              Labels: newbie
>         Attachments: kr.analyzer.4x.tar
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

Reply via email to