[
https://issues.apache.org/jira/browse/LUCENE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649404#comment-13649404
]
Steve Rowe edited comment on LUCENE-4956 at 5/5/13 9:32 PM:
------------------------------------------------------------
{quote}
Could you comment about the origins and authorship of
org.apache.lucene.analysis.kr.utils.StringUtil in your tar file?
I'm seeing a lot of authors in this file. Is this from Apache Commons Lang?
Thanks!
{quote}
I looked at the file content, and it's definitely from Apache Commons Lang (the
class is named {{StringUtils}} there, renamed {{StringUtil}} here), circa early
2010, maybe with a little pulled in from another Commons Lang class.
I've eliminated StringUtil - it's almost all calls to StringUtils.split(String,
separators) - its javadoc is:
{code:java}
/**
* <p>Splits the provided text into an array, separators specified.
* This is an alternative to using StringTokenizer.</p>
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as one separator.
* For more control over the split use the StrTokenizer class.</p>
*
* <p>A <code>null</code> input String returns <code>null</code>.
* A <code>null</code> separatorChars splits on whitespace.</p>
*
* <pre>
* StringUtil.split(null, *) = null
* StringUtil.split("", *) = []
* StringUtil.split("abc def", null) = ["abc", "def"]
* StringUtil.split("abc def", " ") = ["abc", "def"]
* StringUtil.split("abc def", " ") = ["abc", "def"]
* StringUtil.split("ab:cd:ef", ":") = ["ab", "cd", "ef"]
* </pre>
*
* @param str the String to parse, may be null
* @param separatorChars the characters used as the delimiters,
* <code>null</code> splits on whitespace
* @return an array of parsed Strings, <code>null</code> if null String input
*/
{code}
I'm replacing calls to this method with calls to String.split(regex), where
regex is "[char]+", and char is the (in all cases singular) split character.
I'll commit the changes and the StringUtil.java removal in a little bit once
I've got it compiling and the tests succeed.
was (Author: steve_rowe):
bq. Could you comment about the origins and authorship of
org.apache.lucene.analysis.kr.utils.StringUtil in your tar file?
I'm seeing a lot of authors in this file. Is this from Apache Commons Lang?
Thanks!
I looked at the file content, and it's definitely from Apache Commons Lang,
sometime early 2010, maybe with a little pulled in from another Commons Lang
pulled in.
I've eliminated StringUtil - it's almost all calls to StringUtils.split(String,
separators) - its javadoc is:
{code:java}
/**
* <p>Splits the provided text into an array, separators specified.
* This is an alternative to using StringTokenizer.</p>
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as one separator.
* For more control over the split use the StrTokenizer class.</p>
*
* <p>A <code>null</code> input String returns <code>null</code>.
* A <code>null</code> separatorChars splits on whitespace.</p>
*
* <pre>
* StringUtil.split(null, *) = null
* StringUtil.split("", *) = []
* StringUtil.split("abc def", null) = ["abc", "def"]
* StringUtil.split("abc def", " ") = ["abc", "def"]
* StringUtil.split("abc def", " ") = ["abc", "def"]
* StringUtil.split("ab:cd:ef", ":") = ["ab", "cd", "ef"]
* </pre>
*
* @param str the String to parse, may be null
* @param separatorChars the characters used as the delimiters,
* <code>null</code> splits on whitespace
* @return an array of parsed Strings, <code>null</code> if null String input
*/
{code}
I'm replacing calls to this method with calls to String.split(regex), where
regex is "[char]+", and char is the (in all cases singular) split character.
I'll commit the changes and the StringUtil.java removal in a little bit once
I've got it compiling and the tests succeed.
> the korean analyzer that has a korean morphological analyzer and dictionaries
> -----------------------------------------------------------------------------
>
> Key: LUCENE-4956
> URL: https://issues.apache.org/jira/browse/LUCENE-4956
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/analysis
> Affects Versions: 4.2
> Reporter: SooMyung Lee
> Assignee: Christian Moen
> Labels: newbie
> Attachments: kr.analyzer.4x.tar
>
>
> Korean language has specific characteristic. When developing search service
> with lucene & solr in korean, there are some problems in searching and
> indexing. The korean analyer solved the problems with a korean morphological
> anlyzer. It consists of a korean morphological analyzer, dictionaries, a
> korean tokenizer and a korean filter. The korean anlyzer is made for lucene
> and solr. If you develop a search service with lucene in korean, It is the
> best idea to choose the korean analyzer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]