[
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644076#comment-13644076
]
Jack Krupansky commented on LUCENE-4956:
----------------------------------------
Looking at the actual tar file, I notice that it has the factory classes placed
in "solr" directories rather than in the lucene directories as factories are
normally organized.
By all means proceed with producing a normal patch that shows the final
organization of this new analysis package.
Some other issues:
1. Complete absence of Java doc for the tokenizer factory and token filter
factory classes - it is not "Solr user-ready" at present. There should be an
XML example of a token filter with the parameters, as is the usual practice in
Lucene/Solr.
2. No Apache license headers in the "Solr" code. I thought this stuff was
already supposed to be ASL 2.0?
3. No Solr schema.xml change to add the text_ko field type.
4. At least the KoreanAnalyzer.java and KoreanTokenizer.java source code have
tab characters - odd format. Need to be normalized for Lucene project
conventions.
5. There is a hardwired stop word list in KoreanAnalyzer that appears to be
nearly identical or close to StopAnalyzer.ENGLISH_STOP_WORDS_SET. Why doesn't
that static code copy the StopAnalyzer list and then add the few extra terms
that are needed? If there is a reason, place it in a comment.
But as I said, by all means proceed to a normal patch file now that the tar
contribution is "legal".
> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/analysis
> Affects Versions: 4.2
> Reporter: SooMyung Lee
> Labels: newbie
> Attachments: kr.analyzer.4x.tar
>
>
> Korean language has specific characteristic. When developing search service
> with lucene & solr in korean, there are some problems in searching and
> indexing. The korean analyer solved the problems with a korean morphological
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene
> and solr. If you develop a search service with lucene in korean, It is the
> best idea to choose the korean analyzer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]