[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

Steve Rowe (JIRA) Sun, 05 May 2013 14:34:16 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649404#comment-13649404
 ]


Steve Rowe edited comment on LUCENE-4956 at 5/5/13 9:32 PM:
------------------------------------------------------------

{quote}
Could you comment about the origins and authorship of 
org.apache.lucene.analysis.kr.utils.StringUtil in your tar file?
I'm seeing a lot of authors in this file. Is this from Apache Commons Lang? 
Thanks!
{quote}

I looked at the file content, and it's definitely from Apache Commons Lang (the 
class is named {{StringUtils}} there, renamed {{StringUtil}} here), circa early 
2010, maybe with a little pulled in from another Commons Lang class.

I've eliminated StringUtil - it's almost all calls to StringUtils.split(String, 
separators) - its javadoc is:

{code:java}
/**
 * <p>Splits the provided text into an array, separators specified.
 * This is an alternative to using StringTokenizer.</p>
 *
 * <p>The separator is not included in the returned String array.
 * Adjacent separators are treated as one separator.
 * For more control over the split use the StrTokenizer class.</p>
 *
 * <p>A <code>null</code> input String returns <code>null</code>.
 * A <code>null</code> separatorChars splits on whitespace.</p>
 *
 * <pre>
 * StringUtil.split(null, *)         = null
 * StringUtil.split("", *)           = []
 * StringUtil.split("abc def", null) = ["abc", "def"]
 * StringUtil.split("abc def", " ")  = ["abc", "def"]
 * StringUtil.split("abc  def", " ") = ["abc", "def"]
 * StringUtil.split("ab:cd:ef", ":") = ["ab", "cd", "ef"]
 * </pre>
 *
 * @param str  the String to parse, may be null
 * @param separatorChars  the characters used as the delimiters,
 *  <code>null</code> splits on whitespace
 * @return an array of parsed Strings, <code>null</code> if null String input
 */
{code}

I'm replacing calls to this method with calls to String.split(regex), where 
regex is "[char]+", and char is the (in all cases singular) split character.

I'll commit the changes and the StringUtil.java removal in a little bit once 
I've got it compiling and the tests succeed.
                
      was (Author: steve_rowe):
    bq. Could you comment about the origins and authorship of 
org.apache.lucene.analysis.kr.utils.StringUtil in your tar file?
I'm seeing a lot of authors in this file. Is this from Apache Commons Lang? 
Thanks!

I looked at the file content, and it's definitely from Apache Commons Lang, 
sometime early 2010, maybe with a little pulled in from another Commons Lang 
pulled in.

I've eliminated StringUtil - it's almost all calls to StringUtils.split(String, 
separators) - its javadoc is:

{code:java}
/**
 * <p>Splits the provided text into an array, separators specified.
 * This is an alternative to using StringTokenizer.</p>
 *
 * <p>The separator is not included in the returned String array.
 * Adjacent separators are treated as one separator.
 * For more control over the split use the StrTokenizer class.</p>
 *
 * <p>A <code>null</code> input String returns <code>null</code>.
 * A <code>null</code> separatorChars splits on whitespace.</p>
 *
 * <pre>
 * StringUtil.split(null, *)         = null
 * StringUtil.split("", *)           = []
 * StringUtil.split("abc def", null) = ["abc", "def"]
 * StringUtil.split("abc def", " ")  = ["abc", "def"]
 * StringUtil.split("abc  def", " ") = ["abc", "def"]
 * StringUtil.split("ab:cd:ef", ":") = ["ab", "cd", "ef"]
 * </pre>
 *
 * @param str  the String to parse, may be null
 * @param separatorChars  the characters used as the delimiters,
 *  <code>null</code> splits on whitespace
 * @return an array of parsed Strings, <code>null</code> if null String input
 */
{code}

I'm replacing calls to this method with calls to String.split(regex), where 
regex is "[char]+", and char is the (in all cases singular) split character.

I'll commit the changes and the StringUtil.java removal in a little bit once 
I've got it compiling and the tests succeed.
                  
> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-4956
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4956
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>    Affects Versions: 4.2
>            Reporter: SooMyung Lee
>            Assignee: Christian Moen
>              Labels: newbie
>         Attachments: kr.analyzer.4x.tar
>
>
> Korean language has specific characteristic. When developing search service 
> with lucene & solr in korean, there are some problems in searching and 
> indexing. The korean analyer solved the problems with a korean morphological 
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a 
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene 
> and solr. If you develop a search service with lucene in korean, It is the 
> best idea to choose the korean analyzer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-4956) the korean analyzer that has a korean morphological analyzer and dictionaries

Reply via email to