[jira] [Commented] (OPENNLP-1403) Enhance JavaDoc in opennlp.tools.langdetect and opennlp.tools.languagemodel packages

ASF GitHub Bot (Jira) Sun, 04 Dec 2022 05:03:03 -0800


    [ 
https://issues.apache.org/jira/browse/OPENNLP-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642997#comment-17642997
 ]


ASF GitHub Bot commented on OPENNLP-1403:
-----------------------------------------

kinow commented on code in PR #445:
URL: https://github.com/apache/opennlp/pull/445#discussion_r1038965583


##########
opennlp-tools/src/main/java/opennlp/tools/langdetect/LanguageDetector.java:
##########
@@ -20,14 +20,29 @@
 import java.io.Serializable;
 
 /**
- * The interface for LanguageDetector which provide the @{@link Language} 
according to the context.
+ * The interface for {@link LanguageDetector} which predicts the {@link 
Language} for a context.
  */
 public interface LanguageDetector extends Serializable {
 
+  /**
+   * Predicts the {@link Language languages} for the full {@code content} 
length.
+   *
+   * @param content The textual content to detect potential {@link Language 
languages} from.
+   * @return the predicted languages
+   */
   Language[] predictLanguages(CharSequence content);
 
+  /**
+   * Predicts the {@link Language} for the full {@code content} length.
+   *
+   * @param content The textual content to detect potential {@link Language 
languages} from.
+   * @return the language with the highest confidence
+   */
   Language predictLanguage(CharSequence content);
 
+  /**
+   * @return Retrieves an array of language (codes) that are supported by a 
{@link LanguageDetector}.

Review Comment:
   Some @return start with upper case, others don't. Don't really bother me, 
but just in case there is a convention or intention to stardardize it 👍 (no 
need to change anything if you don't want too, really).



##########
opennlp-tools/src/main/java/opennlp/tools/languagemodel/NGramLanguageModel.java:
##########
@@ -114,16 +135,30 @@ public StringList predictNextTokens(StringList tokens) {
     return token;
   }
 
+  private double calculateProbability(StringList tokens) {
+    double probability = 0d;
+    if (size() > 0) {
+      for (StringList ngram : NGramUtils.getNGrams(tokens, n)) {
+        double score = stupidBackoff(ngram);
+        probability += StrictMath.log(score);
+        if (Double.isNaN(probability)) {
+          probability = 0d;
+          break;
+        }
+      }
+      probability = StrictMath.exp(probability);
+    }
+    return probability;
+  }
+
   @Override
   public String[] predictNextTokens(String... tokens) {
     double maxProb = Double.NEGATIVE_INFINITY;
     String[] token = null;
 
     for (StringList ngram : this) {
       String[] sequence = new String[ngram.size() + tokens.length];
-      for (int i = 0; i < tokens.length; i++) {
-        sequence[i] = tokens[i];
-      }
+      System.arraycopy(tokens, 0, sequence, 0, tokens.length);

Review Comment:
   👏



##########
opennlp-tools/src/main/java/opennlp/tools/languagemodel/NGramLanguageModel.java:
##########
@@ -91,6 +111,7 @@ public double calculateProbability(String... tokens) {
   }
 
   @Override
+  @Deprecated

Review Comment:
   Is there a recommendation on what to do if calling this deprecated method?  
What to use instead?



##########
opennlp-tools/src/main/java/opennlp/tools/langdetect/DefaultLanguageDetectorContextGenerator.java:
##########
@@ -34,11 +34,12 @@ public class DefaultLanguageDetectorContextGenerator 
implements LanguageDetector
   protected final CharSequenceNormalizer normalizer;
 
   /**
-   * Creates a customizable @{@link DefaultLanguageDetectorContextGenerator} 
that computes ngrams from text
-   * @param minLength min ngrams chars
-   * @param maxLength max ngrams chars
-   * @param normalizers zero or more normalizers to
-   *                    be applied in to the text before extracting ngrams
+   * Creates a customizable {@link DefaultLanguageDetectorContextGenerator} 
that computes ngrams from text.
+   *
+   * @param minLength The min number of ngrams characters. Must be greater 
than {@code 0}.
+   * @param maxLength The max number of ngrams characters. Must be greater 
than {@code 0}
+   *                  and must be greater than {@code minLength}.
+   * @param normalizers zero or more normalizers to be applied in to the text 
before extracting ngrams.

Review Comment:
   Upper case Z?





> Enhance JavaDoc in opennlp.tools.langdetect and opennlp.tools.languagemodel 
> packages
> ------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-1403
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1403
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Documentation
>    Affects Versions: 2.1.0
>            Reporter: Martin Wiesner
>            Priority: Minor
>             Fix For: 2.1.1
>
>
> The JavaDoc of the _opennlp.tools.langdetect_ and 
> _opennlp.tools.languagemodel_ packages suffer from several inconsistencies 
> and missing descriptions. Moreover, several typos are present that need 
> sanitizing.
> It needs enhancements and/or additions to provide more clarity for readers of 
> that part of the OpenNLP API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (OPENNLP-1403) Enhance JavaDoc in opennlp.tools.langdetect and opennlp.tools.languagemodel packages

Reply via email to