[
https://issues.apache.org/jira/browse/OPENNLP-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642997#comment-17642997
]
ASF GitHub Bot commented on OPENNLP-1403:
-----------------------------------------
kinow commented on code in PR #445:
URL: https://github.com/apache/opennlp/pull/445#discussion_r1038965583
##########
opennlp-tools/src/main/java/opennlp/tools/langdetect/LanguageDetector.java:
##########
@@ -20,14 +20,29 @@
import java.io.Serializable;
/**
- * The interface for LanguageDetector which provide the @{@link Language}
according to the context.
+ * The interface for {@link LanguageDetector} which predicts the {@link
Language} for a context.
*/
public interface LanguageDetector extends Serializable {
+ /**
+ * Predicts the {@link Language languages} for the full {@code content}
length.
+ *
+ * @param content The textual content to detect potential {@link Language
languages} from.
+ * @return the predicted languages
+ */
Language[] predictLanguages(CharSequence content);
+ /**
+ * Predicts the {@link Language} for the full {@code content} length.
+ *
+ * @param content The textual content to detect potential {@link Language
languages} from.
+ * @return the language with the highest confidence
+ */
Language predictLanguage(CharSequence content);
+ /**
+ * @return Retrieves an array of language (codes) that are supported by a
{@link LanguageDetector}.
Review Comment:
Some @return start with upper case, others don't. Don't really bother me,
but just in case there is a convention or intention to stardardize it 👍 (no
need to change anything if you don't want too, really).
##########
opennlp-tools/src/main/java/opennlp/tools/languagemodel/NGramLanguageModel.java:
##########
@@ -114,16 +135,30 @@ public StringList predictNextTokens(StringList tokens) {
return token;
}
+ private double calculateProbability(StringList tokens) {
+ double probability = 0d;
+ if (size() > 0) {
+ for (StringList ngram : NGramUtils.getNGrams(tokens, n)) {
+ double score = stupidBackoff(ngram);
+ probability += StrictMath.log(score);
+ if (Double.isNaN(probability)) {
+ probability = 0d;
+ break;
+ }
+ }
+ probability = StrictMath.exp(probability);
+ }
+ return probability;
+ }
+
@Override
public String[] predictNextTokens(String... tokens) {
double maxProb = Double.NEGATIVE_INFINITY;
String[] token = null;
for (StringList ngram : this) {
String[] sequence = new String[ngram.size() + tokens.length];
- for (int i = 0; i < tokens.length; i++) {
- sequence[i] = tokens[i];
- }
+ System.arraycopy(tokens, 0, sequence, 0, tokens.length);
Review Comment:
👏
##########
opennlp-tools/src/main/java/opennlp/tools/languagemodel/NGramLanguageModel.java:
##########
@@ -91,6 +111,7 @@ public double calculateProbability(String... tokens) {
}
@Override
+ @Deprecated
Review Comment:
Is there a recommendation on what to do if calling this deprecated method?
What to use instead?
##########
opennlp-tools/src/main/java/opennlp/tools/langdetect/DefaultLanguageDetectorContextGenerator.java:
##########
@@ -34,11 +34,12 @@ public class DefaultLanguageDetectorContextGenerator
implements LanguageDetector
protected final CharSequenceNormalizer normalizer;
/**
- * Creates a customizable @{@link DefaultLanguageDetectorContextGenerator}
that computes ngrams from text
- * @param minLength min ngrams chars
- * @param maxLength max ngrams chars
- * @param normalizers zero or more normalizers to
- * be applied in to the text before extracting ngrams
+ * Creates a customizable {@link DefaultLanguageDetectorContextGenerator}
that computes ngrams from text.
+ *
+ * @param minLength The min number of ngrams characters. Must be greater
than {@code 0}.
+ * @param maxLength The max number of ngrams characters. Must be greater
than {@code 0}
+ * and must be greater than {@code minLength}.
+ * @param normalizers zero or more normalizers to be applied in to the text
before extracting ngrams.
Review Comment:
Upper case Z?
> Enhance JavaDoc in opennlp.tools.langdetect and opennlp.tools.languagemodel
> packages
> ------------------------------------------------------------------------------------
>
> Key: OPENNLP-1403
> URL: https://issues.apache.org/jira/browse/OPENNLP-1403
> Project: OpenNLP
> Issue Type: Improvement
> Components: Documentation
> Affects Versions: 2.1.0
> Reporter: Martin Wiesner
> Priority: Minor
> Fix For: 2.1.1
>
>
> The JavaDoc of the _opennlp.tools.langdetect_ and
> _opennlp.tools.languagemodel_ packages suffer from several inconsistencies
> and missing descriptions. Moreover, several typos are present that need
> sanitizing.
> It needs enhancements and/or additions to provide more clarity for readers of
> that part of the OpenNLP API.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)