Repository: commons-text Updated Branches: refs/heads/master f2f24aa6f -> d39dbb548
SANDBOX-496 Write user guide Project: http://git-wip-us.apache.org/repos/asf/commons-text/repo Commit: http://git-wip-us.apache.org/repos/asf/commons-text/commit/d39dbb54 Tree: http://git-wip-us.apache.org/repos/asf/commons-text/tree/d39dbb54 Diff: http://git-wip-us.apache.org/repos/asf/commons-text/diff/d39dbb54 Branch: refs/heads/master Commit: d39dbb5485b49a2ba1b9cc93738d8a02caf65ecb Parents: f2f24aa Author: Bruno P. Kinoshita <brunodepau...@yahoo.com.br> Authored: Fri Apr 17 18:19:18 2015 +1200 Committer: Bruno P. Kinoshita <brunodepau...@yahoo.com.br> Committed: Fri Apr 17 18:19:18 2015 +1200 ---------------------------------------------------------------------- src/changes/changes.xml | 1 + .../apache/commons/text/names/package-info.java | 3 +++ .../commons/text/similarity/CosineDistance.java | 1 + .../similarity/internal/RegexTokenizer.java | 2 +- .../commons/text/similarity/package-info.java | 20 ++++++++++++++++++++ src/site/site.xml | 1 - src/site/xdoc/index.xml | 12 ++++++------ src/site/xdoc/proposal.xml | 2 +- 8 files changed, 33 insertions(+), 9 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/commons-text/blob/d39dbb54/src/changes/changes.xml ---------------------------------------------------------------------- diff --git a/src/changes/changes.xml b/src/changes/changes.xml index c4cbe05..2768608 100644 --- a/src/changes/changes.xml +++ b/src/changes/changes.xml @@ -22,6 +22,7 @@ <body> <release version="1.0" date="tba" description="tba"> + <action issue="SANDBOX-496" type="add" dev="kinow">Write user guide</action> <action issue="SANDBOX-488" type="fix" dev="kinow">Work on the string metric, distance, and similarity definitions for the project</action> <action issue="SANDBOX-487" type="add" dev="kinow">Human name parser</action> <action issue="SANDBOX-492" type="fix" dev="kinow" due-to="Jonathan baker">Create StringDistanceFrom class that contains a StringMetric and the "left" side string. This would have a method that accepts the "right" side string to test.</action> http://git-wip-us.apache.org/repos/asf/commons-text/blob/d39dbb54/src/main/java/org/apache/commons/text/names/package-info.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/commons/text/names/package-info.java b/src/main/java/org/apache/commons/text/names/package-info.java index 1423d24..868f6e1 100644 --- a/src/main/java/org/apache/commons/text/names/package-info.java +++ b/src/main/java/org/apache/commons/text/names/package-info.java @@ -17,6 +17,9 @@ /** * <p>A human names parser in Java.</p> * + * <p>The parser can parse different name formats, producing parts of names such as + * first and last name, prefix, suffix and nickname.</p> + * * @since 1.0 */ package org.apache.commons.text.names; http://git-wip-us.apache.org/repos/asf/commons-text/blob/d39dbb54/src/main/java/org/apache/commons/text/similarity/CosineDistance.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/commons/text/similarity/CosineDistance.java b/src/main/java/org/apache/commons/text/similarity/CosineDistance.java index 98ef49e..d4eeae5 100644 --- a/src/main/java/org/apache/commons/text/similarity/CosineDistance.java +++ b/src/main/java/org/apache/commons/text/similarity/CosineDistance.java @@ -28,6 +28,7 @@ import org.apache.commons.text.similarity.internal.Tokenizer; * <p>It utilizes the CosineSimilarity to compute the distance. Character sequences * are converted into vectors through a simple tokenizer that works with </p> * + * @see org.apache.commons.text.similarity.internal.RegexTokenizer * @since 1.0 */ public class CosineDistance implements EditDistance<Double> { http://git-wip-us.apache.org/repos/asf/commons-text/blob/d39dbb54/src/main/java/org/apache/commons/text/similarity/internal/RegexTokenizer.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/commons/text/similarity/internal/RegexTokenizer.java b/src/main/java/org/apache/commons/text/similarity/internal/RegexTokenizer.java index cf49536..743baa3 100644 --- a/src/main/java/org/apache/commons/text/similarity/internal/RegexTokenizer.java +++ b/src/main/java/org/apache/commons/text/similarity/internal/RegexTokenizer.java @@ -23,7 +23,7 @@ import java.util.regex.Pattern; /** * A simple word tokenizer that utilizes regex to find words. It applies a regex - * {@code}(\\w)+{@code} over the input text to extract words from a given character + * {@code}(\w)+{@code} over the input text to extract words from a given character * sequence. * * @since 0.1 http://git-wip-us.apache.org/repos/asf/commons-text/blob/d39dbb54/src/main/java/org/apache/commons/text/similarity/package-info.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/commons/text/similarity/package-info.java b/src/main/java/org/apache/commons/text/similarity/package-info.java index 8e9d478..bd1e400 100644 --- a/src/main/java/org/apache/commons/text/similarity/package-info.java +++ b/src/main/java/org/apache/commons/text/similarity/package-info.java @@ -17,6 +17,26 @@ /** * <p>Provides algorithms for string similarity.</p> * + * <p>The algorithms that implement the EditDistance interface follow the same + * simple principle: the more similar (closer) strings are, lower is the distance. + * For example, the words house and hose are closer than house and trousers.</p> + * + * <p>The following algorithms are available at the moment:</p> + * + * <ul> + * <li>{@link org.apache.commons.text.similarity.CosineDistance Cosine Distance}</li> + * <li>{@link org.apache.commons.text.similarity.CosineSimilarity Cosine Similarity}</li> + * <li>{@link org.apache.commons.text.similarity.FuzzyScore Fuzzy Score}</li> + * <li>{@link org.apache.commons.text.similarity.HammingDistance Hamming Distance}</li> + * <li>{@link org.apache.commons.text.similarity.JaroWrinklerDistance Jaro-Wrinkler Distance}</li> + * <li>{@link org.apache.commons.text.similarity.LevenshteinDistance Levenshtein Distance}</li> + * </ul> + * + * <p>The {@link org.apache.commons.text.similarity.CosineDistance Cosine Distance} + * utilises a {@link org.apache.commons.text.similarity.internal.RegexTokenizer regular expression tokenizer (\w+)}. + * And the {@link org.apache.commons.text.similarity.LevenshteinDistance Levenshtein Distance}'s + * behaviour can be changed to take into consideration a maximum throughput.</p> + * * @since 1.0 */ package org.apache.commons.text.similarity; http://git-wip-us.apache.org/repos/asf/commons-text/blob/d39dbb54/src/site/site.xml ---------------------------------------------------------------------- diff --git a/src/site/site.xml b/src/site/site.xml index 53a5911..5ae59d8 100644 --- a/src/site/site.xml +++ b/src/site/site.xml @@ -26,7 +26,6 @@ <menu name="Text"> <item name="Overview" href="/index.html"/> <item name="Download" href="/download_commons-text.cgi"/> - <item name="Users guide" href="/userguide.html"/> <item name="Release History" href="/release-history.html"/> <item name="Javadoc (Latest release)" href="javadocs/api-release/index.html"/> </menu> http://git-wip-us.apache.org/repos/asf/commons-text/blob/d39dbb54/src/site/xdoc/index.xml ---------------------------------------------------------------------- diff --git a/src/site/xdoc/index.xml b/src/site/xdoc/index.xml index c67cff3..44464a5 100644 --- a/src/site/xdoc/index.xml +++ b/src/site/xdoc/index.xml @@ -36,8 +36,8 @@ The package descriptions in the <a href="javadocs/api-release/index.html">JavaDo and various <a href="project-reports.html">project reports</a> are provided. </p> <p> -The <a href="source-repository.html">subversion repository</a> can be -<a href="http://svn.apache.org/viewvc/commons/proper/lang/trunk/">browsed</a>, or you can browse/contribute via <a href="https://github.com/apache/commons-text">GitHub</a>. +The <a href="source-repository.html">Git repository</a> can be +<a href="https://git-wip-us.apache.org/repos/asf?p=commons-text.git">browsed</a>, or you can browse/contribute via <a href="https://github.com/apache/commons-text">GitHub</a>. </p> <p> The code base is monitored by a Sonar instance running on <a href="https://analysis.apache.org/dashboard/index/72046">analysis.apache.org</a>. @@ -51,11 +51,11 @@ The code base is monitored by a Sonar instance running on <a href="https://analy <!-- ================================================== --> <section name="Getting Involved"> <p> -The <a href="mail-lists.html">commons developer mailing list</a> is the main channel of communication for contributors. Please remember that the lists are shared between all commons components, so prefix your email by [lang]. </p> +The <a href="mail-lists.html">commons developer mailing list</a> is the main channel of communication for contributors. Please remember that the lists are shared between all commons components, so prefix your email by [text]. </p> <p>You can also visit the #apache-commons IRC channel on irc.freenode.net or peruse <a href="issue-tracking.html">JIRA</a>. Specific links of interest for JIRA are:</p> <ul> -<li>Ideas looking for code: <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20LANG%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%20%22Patch%20Needed%22%20ORDER%20BY%20priority%20DESC">Patch Needed</a></li> -<li>Issues with patches, looking for reviews: <a href="https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%22Review%20Patch%22%20AND%20project%20%3D%20LANG%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC">Review Patch</a></li> +<li>Ideas looking for code: <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20TEXT%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%20%22Patch%20Needed%22%20ORDER%20BY%20priority%20DESC">Patch Needed</a></li> +<li>Issues with patches, looking for reviews: <a href="https://issues.apache.org/jira/issues/?jql=fixVersion%20%3D%20%22Review%20Patch%22%20AND%20project%20%3D%20TEXT%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC">Review Patch</a></li> </ul> <p>Alternatively you can go through the <em>Needs Work</em> tags in the <a href="taglist.html">TagList report</a>.</p> <p>If you'd like to offer up pull requests via GitHub rather than applying patches to JIRA, we have a <a href="https://github.com/apache/commons-text/">GitHub mirror</a>. </p> @@ -67,7 +67,7 @@ The <a href="mail-lists.html">commons mailing lists</a> act as the main support The user list is suitable for most library usage queries. The dev list is intended for the development discussion. Please remember that the lists are shared between all commons components, -so prefix your email by [lang]. +so prefix your email by [text]. </p> <p> Bug reports and enhancements are also welcomed via the <a href="issue-tracking.html">JIRA</a> issue tracker. http://git-wip-us.apache.org/repos/asf/commons-text/blob/d39dbb54/src/site/xdoc/proposal.xml ---------------------------------------------------------------------- diff --git a/src/site/xdoc/proposal.xml b/src/site/xdoc/proposal.xml index 6dcc0a7..1af93b5 100644 --- a/src/site/xdoc/proposal.xml +++ b/src/site/xdoc/proposal.xml @@ -78,8 +78,8 @@ implement higher order text processing.</p> <p>The initial committers on the <em>Commons Text</em> component shall be as follows: <ul> -<li>Bruno P. Kinoshita (kinow)</li> <li>Benedikt Ritter (britter)</li> +<li>Bruno P. Kinoshita (kinow)</li> </ul> </p>